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.  INTRODUCTION 


This  paper  deals  with  the  identification/ approximation  of  a 
linear  system  by  poles  and  residues  from  a  measured  finite  length 
input-output  record  of  the  system.  The  objective  of  this  paper 
is  threefold: 

1)  to  illustrate  that  several  different  formulations  for 
characterizing  the  impulse  response  of  a  system  yield  the  same  set 
of  poles; 

2)  to  show  how  different  formulations  regularize  the  ill-posed 
system  identification  problem,  and 

3)  to  demonstrate  that  relatively  stable,  consistent  and  re¬ 
liable  results  in  the  identification  of  a  system  by  poles  and  resi¬ 
dues  from  a  finite  length  input-output  record  can  be  achieved  by 
the  pencil-of-f unctions  method. 

Recognition  that  different  formulations  yield  the  same  poles 
is  not  widely  appreciated.  In  this  paper  it  is  shown  that  formu¬ 
lations  based  upon  different  assumptions  result  in  identical  sets 
of  analysis  equations.  For  example,  it  is  not  at  all  obvious  that 
Prony's  method  (as  derived  by  most  numerical  analysis  formultions) 
may  be  interpreted  in  terms  of  predicting  each  value  of  data  and 
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therefore,  is  a  fora  of  digital  Wiener  filter.  Markel  [1]  and  Markel 
and  Gray  [2]  did  recognire  the  fact  that  identical  analysis  equations 
can  be  obtained  by  several  different  techniques. 

In  our  discussion  of  the  various  approaches,  only  references 
which  are  directly  relevant  are  noted.  Mo  attempt  has  been  made 
to  cite  the  earliest  sources.  In  many  cases,  additional  references 
may  be  found  in  papers  mentioned. 

The  problem  of  interest  is  to  identify/ approximate  the  transfer 
function  of  a  system  by  its  poles  and  residues  when  the  noise  contaminated 
input  and  output  are  specified.  The  signal  and  noise  are  considered  to 
be  stationary  processes.  When  time  limited  signals  are  involved, 
the  problem  is  converted  to  an  equivalent  stationary  problem  by  con¬ 
volving  the  time  limited  signal  with  white  noise  of  unit  power  [3]. 

In  the  second  section  the  classical  method  for  extracting  the  signal 
from  noise  is  discussed.  This  is  the  Wieaer-Kolmogoroff  theory  [4], 

The  digital  form  of  the  Wiener-Hopf  equation  is  derived.  Topics 
associated  with  Wiener  filters  are  also  presented.  They  include  in¬ 
verse  filter  design,  linear  prediction,  predictive  deconvolution  Cor 
spiking)  filter  design,  recursive  filter  design  and  Kalman  filtering. 

1  J.D.  Markel,  "Formant  Trajectory  Estimation  from  a  Linear  Least-Square 
Inverse  Filter  Formulation,"  SCRL-Mono graph  7,  Speech  Communications 
Research  Laboratory,  Inc.,  Santa  Barbara,  October  1971. 

2  J.D.  Markel  and  A.H.  Gray,  "Linear  Prediction  of  Speech,"  Springer- 
Verlag:  Berlin. 

3.E.A.  Robinson  and  S.  Traital,  "Principles  of  Digital  ’Wiener  Filtering/' 
Geophysical  Prospecting,  September  1967,  pp.  311-333. 

4  N.  Levinson,  "The  Wiener  RMS  Error  Criterion  in  Filter  Design  and 
Prediction,"  Journal  of  Mathematics  and  Physics,  1947  V.  25,  pp. 

261-278. 
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Both  the  popularly  known  covariance  and  autocorrelation  methods  are 
derived  from  the  Wiener  filter  theory. 

In  the  third  section  the  various  well-posed  stochastic  exten¬ 
sions  of  an  ill-posed  system  identification/approximation  problem 
are  described.  They  include  the  maximum  likelihood  estimation 
theory,  the  minimum  predictor  error  variance  and  the  maximum  entropy 
spectral  analysis.  It  is  demonstrated  that  identical  analysis 
equations  for  parametric  modeling  of  the  system  can  be  obtained. 

The  fourth  section  provides  Prony's  method  in  various  forms. 

In  particular,  when  a  semi-least  squares  approach  is  applied  to  Prony's 
method,  both  the  autocorrelation  and  the  covariance  method  appear 
as  special  cases.  Thus  it  is  also  a  form  of  a  digital  Wiener  filter. 

The  second  objective  of  this  paper  is  discussed  in  the  fifth 
section.  This  section  discusses  the  various  concepts  of  ill-posed 
and  well-posed  problems  in  system  identification.  It  is  shown  how 
the  different  techniques  regularize  the  ill-posed  system  identifica¬ 
tion  problem  by  introducing  further  limitations  on  the  solution. 
Finally,  it  is  shown  how  the  pencil-of-functions  method  •  radically 
differs  from  the  other  formulations. 

Finally,  the  third  objective  is  demonstrated  in  section  VI  where 
results  are  presented  to  demonstrate  the  claim  that  relatively  stable, 
reliable  and  consistent  results  are  obtained  for  the  location  of  the 
poles  by  the  pencil-of-function  method. 
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II.  WIENER  FILTER  THEORY 

Kolmogoroff  (1942)  and  Wiener  (1943)  were  the  first  to  present 
a  unified  theory  on  extrapolation,  interpolation  and  smoothing  of 
stationary  time  series.  The  linear  filter  which  performs  the  desired 
task  is  obtained  by  the  solution  of  an  integral  equation  known  as  the 
Wiener-Hopf  equation  [4].  For  sampled  data  systems,  the  integral 
form  of  the  Wiener-Hopf  equation  reduces  to  a  finite  sum.  The  present 
treatment  describes  how  Wiener's  concepts  can  be  applied  to  the  identi¬ 
fication/approximation  of  linear  systems.  The  basic  model  for  this 
process  consists  of  an  input  signal,  a  desired  output  signal  and  an 
actual  output  signal.  If  one  minimizes  the  mean-squared  error  between 
the  desired  output  signal  and  the  actual  output  signal,  it  becomes 

possible  to  solve  for  the  optimum  system  commonly  known  as  the  "Wiener" 

filter.  The  fundamental  assumption  underlying  the  procedure  is  that 
all  processes  are  stationary. 

A  stationary  time  series  is  one  whose  statistical  properties 
are  time  invariant.  In  particular,  the  statistics  of  the  time  series 
are  independent  of  time.  By  definition,  a  stationary  time  series 
must  be  of  infinite  duration.  However,  in  an  actual  experiment,  we 
observe  a  times  series  over  a  finite  interval.  In  order  to  apply  the 
concepts  of  Wiener  filtering,  the  finite  length  time  series  is  convolved 

with  a  white  noi9e  series  of  unit  power,  to  yield  a  stationary  time  series 

[3].  Moreover,  in  actual  measurements  only  one  waveform  is  often  available 
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for  computing  various  statistics.  Thus  ensemble  averages  are  frequently 
evaluated  by  means  of  appropriate  time  averages.  This  is  valid  only 
when  the  random  processes  are  ergodic. 

The  concept  of  Wiener  filtering  is  well  known  but  is  included 
here  for  completeness.  The  fundamental  elements  of  digital  Wiener 
filtering  are  summarized  in  Figure  1  for  the  sampled  data  problem. 


t  -  0,l,2,...,K+M-2 


Figure  1.  Principles  cf  Wiener  Filtering. 


Given  the  input  sequence  xt  and  a  desired  output  sequence  d^,  the  problem 
is  to  find  the  linear  filter  coefficients  ft,  whose  output  sequence 
x  *  ^enotes  convolution]  yields  a  minimum  mean-squared  error 

estimate  of  d  .  If  E  { • }  denotes  the  expected  value,  then  the  error 

I  -  E{(dc  -  yt)2}  (2.1) 

is  to  be  minimized. 

In  this  case  an  M-length  filter  f£  -  {fg,f^,  •••»^h-1^  converts, 
in  a  least  error  energy, sense  a  K-length  input  x£  -  (xq,x^, . . . ,Xj._^} 
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Into  a  K+M-l  length  sequence  y^  which  approximates  the  desired  sequence 

d  -  (d_,  d  ,  ....  d,, The  actual  output  is  obtained  as 
t  '0  1  l 

yc  “  'y0*yl*  •••*  yK-rW-2" 


M-l 

-  :  T-0  T  :_T 


(.2.2) 


The  problem  of  making  the  actual  output  y^  as  close  as  possible  to  the 
desired  output  d^  can  be  interpreted  in  terms  of  minimi zing  the  error  energy 

2  mTx 

I  -  E{  (d  -  v  )  }  ■=  E{(d_  -  >  f_x  C2.3) 

t  -t  t  t-0  1  C~- 


The  error  is  minimized  by  evaluating  the  partial  derivatives  of  I  with 
respect  to  f_  and  equating  them  to  zero.  This  results  in  a  set  of 

L 

equations 


df-  E{2(dt 


M-l 

T-0 


M-l 

r 

■0 

for  j  •  0,1,2,  . . . ,M-1 


-2E(d  x  . }  +  2  7  f^E{x  _x  } 

t  t-2  T;0  T  t-T  t_2 


(2.4) 


The  unknown  filter  coefficients  are  obtained  by  solving  the  following 
set  of  simultaneous  equations. 

M-l 

‘  E{itW 

for  j  “  0,1,  ...»  M-l  (2.5) 


In  order  to  solve  the  above  equations,  it  is  necessary  to  compute 
the  expected  values  in  Equation  (2.5).  By  assuming  that  the  ensemble 
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averages  can  be  evaluated  in  terms  of  time  averages,  one  obtains 

1  K-l  ,  , 

*  —  —  -  'X  x  .  *  C  . 

t-T  t-j  K  -  M  ~M  t-T  t-j 


E{x  _x 


C-.o; 


(in  the  covariance  method) 
K-l+M 


x  x 


t-0  ‘  '  1 

K- ; T-j i-1 

I 

t-0 


:-3 


Vt+h-j! 


(T-j) 


(2.7) 


Similar lv 


(in  the  autocorrelation  method) 


E{d  x 


1  1 


K-l 


_  >  d  x 

t  t-j  K  -  M  t-M  t  t-j 


(in  the  covariance  method) 


,  .  K-l+M 

4  *  Jo  ^ 


(2.9) 


(in  the  autocorrelation  method) 


It  is  important  to  stress  that  the  terms  "covariance"  and  "autocorre¬ 
lation"  are  not  based  upon  the  standard  usage  of  the  terms  as  occur 
in  the  theory  of  stochastic  processes.  Rather,  we  follow  the  usage 
which  is  quite  prevalent  in  the  speech  processing  literature  [5], 

The  following  discussion  is  intended  to  clarify  their  interpretation. 

It  is  clear  that  the  covariance  definition  given  by  (2.6)  yield 
an  unbiased  estimate  since 


5  J.  Makhoul,  "Linear  Prediction:  A  Tutorial  Review,”  Proc.  IEEE 
Vol.  63,  No.  U,  April  1975,  pp.  561-580. 
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K-l 


EtE'.x_  _x_  -]  «  it- 


K  -  M  - 


c-M 


n.x  _x  .  ;  =  E;x  x 
t-j  t-T  ; 


On  the  other  hand,  the  autocorrelation  definition  given  by  (2.7) 
results  in  a  biased  estimate  since 


i  K-l+M 

£[E{xt__xt_j}]  *  El|  ^  xt~rXt-jj 


K-!T-j!-l 


t*0 


E-.x  x 

t  t+ :  T  — 


-  -^r--{xt-Txt-j 


Since  (t-J)  £  M,  it  is  interesting  to  note  that  the  bias  is  negligible 
when  M  (the  order  of  the  filter)  is  much  less  than  R.  The  bias  is 
significant,  however,  when  K  is  only  slightly  larger  than  M..  This 
explains  why  a  large  number  of  data  points  (K  >>  M)  is  necessary  for 
unbiased  spectral  estimation  when  using  the  autocorrelation  function 
to  obtain  the  power  spectral  density. 

Because  the  covariance  method  gives  an  unbiased  estimate,  it 
might  be  assumed  that  it  is  a  better  estimate.  However,  the  biased 
estimate  provided  by  the  autocorrelation  method  is  often  preferable. 

As  an  example  consider  the  zero  mean  four  data  point  sequence  (4,  -2, 

-1,  -1)  and  M  ■  2.  Using  the  unbiased  estimator,  the  expected  values 
are  «  1,  ■  1.5,  “  -1,  C^q  *  -2.  In  contrast,  the  biased 

estimator  results  in  r(Q)  -  5.5,  r(1)  -  -1.25,  r(2)  -  -0.5,  r(3)  -  -1.0. 
Note  that  CqQ  is  less  than  whereas  r^Q.  is  guaranteed  to  be  greater 
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chan  r  for  t  >  0. 

it) 

Continuing  with  our  development,  the  discrete  i’iener-Hopf  equa¬ 
tion  presented  in  \  2 . 5 ;  can  be  written  in  the  foliovinc  matrix  fort 
for  the  covariance  method 


iC  1  [F  '  ■  [D  ' 

i j ' MxM  i " Mrl  j " Mxl 


where  [C  ]  is  a  square  matrix  whose  elements  are  given  as 

K-l 


-  _  Vi1:-- 


(2.10) 


C2.ll) 


[r .]  is  a  column  matrix  consisting  of  the  M  unknown  filter  coefficients 


"O’ 


f, ,  . ...fj,  ,,  and  [D  ]  is  a  column  matrix  whose  elements  are  given 
1  M—  *  J 


ov 


K-l 

D,  “  1  d-Xr-l 

J  t-M  w  C  J 


(2.12) 


For  the  autocorrelation  method,  the  unknown  filter  coefficients  are 

obtained  from  the  solution  of  the  matrix  equation 

[R. ,  .,]  [F  ]  -  [D,]  C2.13) 

■  J  MxM  1  Mxl  -1  Mxl 

where  [  R ; j  ]  is  a  square  matrix  whose  elements  are  given  as 


K-l+M  K-l- 1 i-j ! 

Vj|-j0W«n*  tl0  xccr+li-j 


(2.14) 


and  [Dj ]  is  a  column  matrix  defined  by  (12). 

Interestingly,  most  of  the  formulations  for  the  solution  of  an 
unknown  linear  filter  lead  to  analysis  equations  which  can  be  formu¬ 
lated  either  in  terms  of  the  autocorrelation  matrix  equations  (2.13) 
or  in  terms  of  the  covariance  matrix  equations  (2.10). 
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It  is  important  to  note  that  the  Wiener  filter  is  not  always 
realizable  as  a  causal  rational  function  (in  terms  of  poles  and  zeros). 

As  a  result,  the  Wiener  f titer  is,  in  general,  an  infinite  order  filter. 
When  the  order  of  the  filter  is  specified  a  priori,  the  resulting  fil¬ 
ter  may  no  longer  be  optimum. 

Various  forms  of  the  Wiener  filter  have  appeared  under  different 
names  and  have  been  used  in  various  geophysical,  speech  processing, 
and  digital  signal  processing  applications.  Next,  various  modifica¬ 
tions  of  the  'Wiener  filter  are  presented. 

2.1.  Inverse  Filter  Formulation 

The  inverse  filter  attempts  to  transform  the  input  signal  into 
an  impulse  [6].  Assume  that  the  input  sequence  is  transformed  to 
an  impulse  of  area  c  by  an  all-zero  filter  of  the  form 


F(z)  -  y  fiz  i,  with  fQ  -  1  (2.15) 

i*0 

In  terms  of  Figure  1,  the  desired  waveform  d^  is  an  impulse  of  area  Z. 
It  follows  that  the  coefficients  of  the  filter  should  be  chosen  such 
that 

M— 1 

1  -  “  o<5^  (2.16) 

T^O  1  t_l 

where  C£  ■  1  for  t  ■  0  and  zero  otherwise.  Multiplication  of  both 


6  J.D.  Markel,  "Digital  Inverse  Filtering  -  A  New  Tool  for  Formant 
Trajectory  Estimation,"  IEEE  Trans,  on  Audio  and  Electroacoustics, 
Vol.  AU-18,  No.  2,  June  1970,  pp.  137-141. 
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sides 


ov  x  .  anc  summing 
'  t-j 


from  k  to  K- !-*-£,  one  obtains 


M-l 

T-0 


K-l+£ 


(2.17) 


Since  x  is  zero  for  j  >  0,  the  unknown  coefficients  f_  for  T  -  1,2, 
...,  M-l  are  obtained  from  the  solution  of  the  following  equations 


M-l  K-l+i, 

7  f  [  7  x  x  .]  -  0  (2.18) 

T-0  T  t-k  t-J 


for  j  -  1,2,  ....  M-l. 

For  k  -  M  and  £  -  0 ,  the  above  equations  reduce  to  that  of  the  covariance 
equations  as  in  (2.10),  This  is  because 


K-l 

1  5tXt-j 
t-M  J 


0. 


'J  t-M  C2-19) 

For  k  -  0  and  £  -  M,  equation  (2.18)  reduces  to  that  of  the  auto¬ 


correlation  equations.  If  the  input  signal  is  approximated  by  an  all¬ 
pole  model  the  poles  of  the  input  signal  are  obtained  from  the  zeros 


of  F(z)  i.e.  from  the  solution  of  the  polynomial  equations 
M-l 

I  fTz~‘  "  °* 

T-0  1 


(2.20) 


In  particular,  if  (z  is  the  ith  root  of  the  above  equation  (2.20), 
then  the  ith  pole  is  equal  to  £n[(z  ^)  ]. 


2.2  Linear  Prediction 

The  term  "linear  Prediction"  was  first  used  by  Wiener  in  his 
classic  work  on  prediction  of  stationary  time  series.  Since  its  publi¬ 
cation,  it  has  found  wide  application  in  the  determination  of  all-pole 
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models  for  the  processing  of  speech  signals  [5]. 

The  basic  philosophy  here  is  to  take  a  part  of  the  sampled  wave¬ 
form  (say  the  first  M-l  points  from  K  data  points)  and  predict  the 
next  data  point  on  the  waveform  by  proper  choice  of  the  predictor  co¬ 
efficients  a...  The  linear  predictor  of  step  size  one  predicts  the 
Mth  data  point  of  the  waveform  when  a  (M-l)  order  predictor  filter  is 
chosen.  In  the  time  domain,  the  predicted  sample  is  given  by 


M-l 

y 

i-1 


aiVi 


where  (-a^,-a^ ,-a^ , . . . )  are  the  predictor  coefficients.  Assuming 
the  signal  spectrum  is  to  be  modelled  by  an  all-pole  model,  the  co¬ 
efficients  a^  are  the  negative  of  the  values  of  f ^  presented  in  (2.15). 
(M-l)  order  linear  predictor  thus  requires  a  linear  combination  of  the 
previous  (M-l)  samples.  The  error  is  then  given  by 


M-l 

*M  -  *M  +  *M  "  '  J1  aiXM-i  +  *M 


A 


M-l 

"  -  l  aixM_i  a0  ’  -1  (2.21) 

When  (K-l+£)  different  data  points  are  predicted,  the  total  squared 
error  is  defined  by 


I 


K-l+i  ,  K-l+1  M-l  - 

I  [e  }  -III  a  x  ] 
t-k  z  t-k  i-0 


M-l  M-l  K-l+2. 

I  l  Vj{  l  xt-iXt-j} 

i-0  j-0  J  t-k  c  1  c  J 

M-l  M-l  K-l+i 

I  l  fifj (  I  xc-iXt-1}* 

i-0  j-0  3  t-k  c  1  c  0 


(2.22) 


Minimization  of  I,  with  respect  to  the  set  of  the  filter  coefficients 
leads  to  a  set  of  simultaneous  equations  given  by 
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(2.23) 


r 

i 


i-0 


K-1+* 


L 

t-k 


x  .x. 

t-l  t-J 


-  0 


for  j  ■  1,  2,  ....  M  -  1,  from  which  the  unknown  filter  coefficients 
f  are  obtained.  When  k  *  M  and  2*0,  this  amounts  to  minimization 
of  the  error  only  over  data  points  of  the  waveform  from  M  to  K  •  1. 

When  k  ■  0  and  l  *  M,  this  implies  minimization  over  the  entire  wave¬ 
form.  Equations  (2.23)  are  identical  to  the  set  of  equations  (2.19) 
obtained  in  the  case  of  the  inverse  filter  formulation  in  the  previous 
section. 

Linear  prediction  is  thus  equivalent  to  an  all-pole  model  for 
the  spectrum  of  the  input  The  poles  for  this  model  are  again 

obtained  from  the  solution  of  the  polynomial  equation 
M-l  , 

I  0. 

i*0  1 

Thus  if  represents  the  measured  impulse  response  of  the 
system,  the  set  of  poles  obtained  by  linear  prediction  can  be  interpreted 
so  as  to  parameterize  the  system  in  terms  of  an  all-pole  model. 


2.3  Predictive  Deconvolution  or  Spiking  Filter  Design 

The  general  linear  filtering  problem  involves  the  input  ,  the 
impulse  response  h  and  the  output  y£.  When  j.z  is  desirable  to  evalu¬ 
ate  h£  given  x„  and  y  ,  the  problem  is  referred  to  as  deconvolution. 

In  this  sense  the  inverse  filter  problem  discussed  in  section  2.1  is 
a  deconvolution  problem.  Predictive  deconvolution  refers  to  the 
case  in  which  the  output  y  is  assumed  to  be  a  delayed  impulse 
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such  that 


M-l 

;  f  X  •  C  C 

7-0  T  C_‘  t~<^1 


(2.24) 


The  unknown  filter  coefficients  f  can  be  pursued  in  a  manner  anala- 

T 

gous  to  the  inverse  filter  approach  by  assuming  the  input  to  be  re¬ 
presented  as  an  all-pole  model.  However,  we  prefer  to  show  that  the 
filter  coefficients  can  also  be  determined  by  interpreting  the  prob¬ 
lem  as  a  prediction  problem.  It  is  in  this  sense  that  the  term  pre¬ 
dictive  deconvolution  is  used  [7], 

Introduce  the  change  of  variables  t  -  a+1  ■  S  in  equation  (2.24). 
The  resulting  equation  is 


M-l 

)  f  x  .  -  a 5„. 

T;0  e 


(2.25) 


Next,  assume  the  filter  coefficients  to  be  given  by 

( 


ot— 1  zeros 


(—1,0,0,  • « « ,  0,  —  a^ ,  —  &2 »  •••»  )  (2.26) 

The  upper  limit  on  the  summation  is  now  given  by  M  +  a  -  2.  Equation 
(2.25)  can  now  be  written  as 


M+o-2 

M-l 

x  +  J 

t+a-1  4 

T-l 

*rxc+a-T-l  xt+a-l  ST*t-T 

■ 

(2.27) 

If  the  estimate  of  x  is  assumed  to  be  given  bv  x  ,  ,  then 

M-l 

Wl  -  -  Vt-T 

and  o5  can  be  interpreted  as  the  error  of  the  estimate. 


(2.28) 


7  K.L.  Peacock  and  S.  Treital,  "Predictive  Deconvolution:  Theory  and 
Practice,"  Geophysics,  Vol.  34,  No.  2,  April  1969,  pp.  135-169. 
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If  the  total  squared  error  is  minimized  as  was  done  for  the 
linear  prediction  approach  in  section  2.2,  a  set  of  equations  is  ob¬ 
tained  for  the  filter  coefficients,  ^faen  a  -  1,  these  equations  re¬ 
duce  to  the  same  equations  as  were  obtained  in  (2.23) 


2.4  Recursive  Filter  Design 

Recursive  filters  are  often  used  in  digital  filter  design  [8-12]. 
With  respect  to  Fig.  1,  the  spectral  estimation  problem  is  now  posed 
in  the  following  manner .  Assume  the  filter  input  is  the  impulse  6  . 
Let  the  filter  impulse  response  be  determined  so  as  to  approximate 
the  data  sequence  in  a  minimum  mean  squared  error  sense.  If 
is  the  input  to  the  filter  and  y£  is  the  output,  then  they  are 
related  by 


M-l  l-l 

l  Vt-k  •  l  br*t-r 
k»0  r-0 

Application  of  the  z-transform  to  both  sides  yields 


8  A.  Oppenneim  and  R.  Shaefer,  Digital  Signal  Processing.  Englewood 
Cliffs,  NJ:  Prentice  Hall,  1975. 

9  J.L.  Shanks,  "Recursion  Filter  for  Digital  Processing,"  Geophysics, 

Vol.  XXXII,  No.  1,  February  1967,  pp.  33-51. 

10  D.L.  Fletcher  and  C.N.  Weygandt,  "A  Digital  Method  of  Transfer  Func¬ 
tion  Calculation,"  IEEE  Trans,  on  Circuit  Theory,  Jan.  1971,  pp.  185-187. 

11  C.S.  Burrus  and  T.W.  Parks,  "Time  Domain  Design  of  Recursive  Digital 
Filters,"  IEEE  Trans,  on  Audio  and  Electroacoustics,  Vol.  AU-18, 

No.  2,  June  1970,  pp.  137-141. 

12  S.  Treital  and  E.A.  Robinson,  "The  Design  of  High  Resolution  Digital 
Filters,"  IEEE  Trans,  on  Geoscience  Electronics,  Vol.  GE-4,  No.  1, 

June  1966,  pp.  25-38. 
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or 


M-l 

r> 

l  \ 

k-0  K 


2~^Y (z) 


L-i 

T  b  z“rx(z) 

r-0  r 


F(z) 


L-l 

L 

r*0 


b  z*r 


M-l 

r* 

) 

k«0 


B  ( 2  ) 
A(2)  . 


(2.29) 


Therefore,  a  recursive  filter  has  a  transfer  function  which  is 
expressed  as  a  ratio  of  two  polynomials  of  the  2-transform  variable. 
The  objective  here  is  to  synthesize  the  filter  from  the  given  impulse 
response  of  the  system.  It  is  important  to  note  that,  unlike  the 
previous  three  types  of  filters,  this  is  not  an  all-pole  filter  but 
a  pole-zero  model.  In  other  words,  given  some  desired  filter  operator 
D(z)  (which  is  the  transfer  function  of  a  desired  system  having  the 
impulse  response  dt) ,  we  require  the  polynomials  A(z)  and  B(z)  of 
the  filter 


such  that 


F(z) 


Mil 

A(z) 


(2.30) 


-1  —9  .TTJ.1 

F(z)  X  D(z)  «  dQ  +  dxz  x  +  d2z  +  . . .  +  d^z  *  x  .  (2.31) 


A  technique  for  determining  A(z)  and  B(z)  is  outlined  next. 

A(z)  -  1  +  alZ-1  +  a2z"2  +  ...  +  a^z-***1  (2.32) 

and 

B(z)  »  bg  +  b^z  ^  +  bjZ  2  ■+■  ...  +  t>L_]_z  (2.33) 

where  M  and  L  are  arbitrary  numbers  which  fix  the  number  of  poles 
and  zeros  respectively  for  the  filter.  Also  divide  B(z)  by  A(z) 
so  as  to  obtain  the  infinite  series 

■  fg  +  f2  z  1  +  f2  z”2  +  ... 

Since  F(z)  A(z)  -  B(z),  it  follows  that 
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F(z)  {  1  +  a;[  z-1  +  a;  z~2  -r  ...  aM_1 


b0  +  bl  Z'X  +  b2  2~2 


•  *  °L-1  2 


Since  multiplication  of  z-transforms  is  equivalent  to  convolution  of 
the  discrete  time  series,  the  series  of  coefficients  is  equal  to 
the  convolution  of  the  f„  coefficients  with  the  at  coef f cients.  Or 
equivalently, 

M-l 

b‘  ■  y l  Vt-r 

By  assumption,  a  *  1.  Therefore, 


ii  i. 

:  *  b  -  J  a .  f 

C  b  j-1  J  C"J 


As  b  ■  0  for  t  >  L  'from  equation  (2.33)),  one  may  write 


M-l 

ft '  -  .1  Vt- 


j  t-j  * 


for  t  >  L 


(2.34) 


By  assuming  the  approximation  in  (2.31)  is  valid,  the  coefficients 
ft  approximate  the  coefficients  d  for  t  >  L.  Equation  (2.34)  may 
then  be  approximated  by 

M-l 

dt  *  -  I  *j 

j~l 


for  t  *  L  +  1,  L  +  2,  ...,  K  -  1 


(2.35) 


The  error  in  (2.35)  is  given  by 


dt  +  I  aj  dt-1 


M-l 

I  aj  dt_j  i  siace  a0  "  1 
3-0 


for  t  -  L  +  1,  L  +  2,  . ..,  K  -  1 


(2.36) 


The  a  coefficients  are  chosen  in  such  a  way  that  the  mean- squared 
error  is  minimized.  In  particular, 

K-l  ,  K-l  M-l  0 


I  -  >  e"  -  >  [  /  a.d  ]‘ 

t-L+1  t-L+1  j-0  J  C-:! 


(2.37) 


is  minimized.  By  differentiating  I  of  equation  (2.37)  with  respect  to 
a.  and  equating  the  derivatives  to  zero,  a  set  of  simultaneous  equa¬ 


tions  is  obtained.  They  are  given  by 


M-l 

l 


j-0 


K-l 

a.[  [  d 

■L+l 


if 


t-jdt-ki 


0 


(2.38) 


for  k  -  1,2,  ... ,M-1. 

For  a  realizable  filter  the  numerator  polynomial  is  generally  one 
degree  lower  than  the  denominator  polynomial  (if  the  poles  are  simple). 
Thus 

L  +  1  -  H. 

-Hence,  (2.38)  reduces  to 
M-l  K-l 


l  V  E  dt-jdt-k]  ” 

j-0  2  t-M  C  2  Z 


(2.39) 


which  is  equivalent  to  the  covariance  equations  as  given  by  (2.10).  The 
right  side  of  (2.10)  is  zero  because 
K-l 

D.  *  I  <5  ,  -  0  for  j  -  1,  2,  ... ,  M-l. 

J  t-M  Z~2 

The  poles  for  the  filter  are  then  obtained  by  the  solution  of  the  poly¬ 
nomial  equation 

M-l 


l  v 

i-0  1 


-i 


It  was  shown  previously  that  the  inverse  filter,  assuming  an 
ail-pole  model  for  the  signal  spectrum,  is  also  equivalent  to  (2.10). 
We  conclude  therefore  that  the  poles  for  an  all-pole  model  correspond 
to  the  identical  set  of  poles  for  a  pole  zero  model  for  the 
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same  order  filter. 


Next  the  residues  at  the  poles  (or  equivalently,  the  numerator 
polynomial)  can  be  obtained  by  minimizing  the  mean-squared  error 

.  ,  r  r  ,  ,  i  2 

given  by  -  c  ]  . 

t 

It  is  interesting  to  note  that  when  the  numerator  polynomial  is 
realized  directly  in  the  form  presented  in  (2.33),  the  problem  re¬ 
duces  to  the  case  of  the  Pade'  approximation  [13].  Mathematically, 

Pade*  approximation  results  in  an  approximation  of  D(z)  by  F(z)  such 
that  the  seminorm 

l|D(z)  -  F(z)i|  -  |D(1>  -  F(l)|  +  |D1(1)  -  F1(l)| 

+  ...  +  lDL‘t^i“1(l)  -  FL'^'1(1)|  (2.40) 

is  made  zero.  Here  Dk(l)  represent  the  k  th  derivative  of  D(z)  evalu¬ 
ated  on  the  unit  circle . 


2.5  Kalman  Filter  Theory 

Underlying  Wiener  filter  design  is  the  so-called  Wiener-Hopf  integral 
equation,  its  solution  through  spectral  factorization,  and  the  practi¬ 
cal  problem  of  synthesizing  the  theoretically  optimal,  filter  from  its 
impulse  response.  The  normal  Wiener  filter  is  derived  from  the  Weiner- 
Hopf  equation  and  in  general  this  equation  can  be  solved  only  in  the 
steady  state,  i.e.  when  the  observation  interval  is  semi-inf inite.  The 
contribution  of  Kalman  was  recognition  of  the  fact  that  the  integral 

13  R.N.  McDonough,  "Representation  and  Analysis  of  Signals,  Part  XV  - 
Matched  Exponents  for  the  Representation  of  Signals,"  Johns  Hopkins 
University,  April  1963. 
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equation  could  be  converted  into  a  nonlinear  differential  equation 

whose  solution  contains  all  the  necessary  information  for  design  of 

the  optimal  filter.  The  problem  of  spectral  f actorication  in  the 

Ltiener  filter  is  analagous  to  the  requirement  for  solving  M(M+l)/2 

coupled  nonlinear  algebraic  equations  in  the  if-order  Kalman  filter. 

These  equations  can  be  solved  numerically  for  transient  type  problems, 

where  data  is  available  only  for  a  finite  interval.  This,  in  general, 

results  in  the  Kalman  filters  being  time— variant  .  However,  in  the 

steady-state  the  Kalman  filter  reduces  to  the  time  invariant  Wiener 

filter  [14],  The  presentation  by  Sorensen  [15]  expresses  the  results 

of  Kalman  filter  theory  in  a  way  that  makes  this  comparison  easier. 

The  problem  involves  estimating  a  signal! s  „v,from  measured  data 

{d  ,  d .  d  }.  If  the  estimate  is  computed  as  a  linear  combi- 

0  1  K— 1 

nation  of  the  d  ,  then 


M-l 

s  »  y  A.d  , 
n  i-0  1  i 


C2.41) 


The  M  coefficients  A^  are  chosen  in  such  a  way  that  the  mean-squared 


error, 


I  -  E[ (s  -  s  )T ( s  -  s  ) ] , 
-n  -n  -n  -n  ’ 


(2.42) 


is  minimized.  Here  T  denotes  the  transpose  of  the  row  vector  (s  -  s  ) , 

-  o  •  n 

This  criterion  is  satisfied  when  the  error  in  the  estimate  s  is 

“  n 

orthogonal  to  the  measured  data,  or 


E[(«  ~  8  >d;]  -  0, 

-n  -n  i 


for  i  ■  0,1, ... ,  M-l 


(2.43) 


14  A.  Gelb,  "Applied  Optimal  Estimation,”  MIT  Press,  1974.. 

15  H.W.  Sorensen,  "Least-squares  Estimation  from  Gauss  to  Kalman," 
IEEE  Spectrum,  July  1970,  pp.  63-68. 


Expressed  in  a  different  way 


M-l 

E[s  d:i  -  '  A-E[d^dt] 

i-0 


:or 


0,1,..., M-l. (2.44) 


However,  the  matrix  inversion  that  is  required  becomes  computationally 
impractical  when  M  is  large.  Wiener  and  Kolmogoroff  assumed  an  infinite 
amount  of  data  (that  is,  the  lower  limit  of  the  summation  is  -00  rather 
than  zero).  £q  (2.43)  is  then  the  discrete  form  of  the  Wiener-Hmpf 
equation  which  was  solved  using  spectral  factorization. 


The  basic  difference  between  Wiener-Kolmogorof f  theory  and  the 
Kalman  filter  theory  is  how  equation  (2.44)  is  solved.  In  1955  J.W.Follin 
suggested  a  recursive  approach  to  solve  (2.44).  It  is  clear  (see  reference 
15.  p.  65)  that  Foilin' s  work  provided  a  direct  stimulus  for  the  work  of 
Richard  Bucy,  which  led  to  his  subsequent  collaboration  with  Kalman  in 
the  total  development  of  the  "state  space"  approach  for  obtaining  the 
filter  equations. 


2.6  Summary 

As  outlined  above  all  forms  of  the  digital  Wiener  filter  lead 
either  to  the  covariance  or  the  autocorrelation  equations.  It  is 
also  interesting  that  the  same  set  of  poles  is  obtained  whether  one 
models  the  signal  as  an  all-pole  model  or  as  a  pole-zero  model. 
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III.  STOCHASTIC  METHODS  APPLIED  TO 
SYSTEM  IDENTIFICATION 

Three  different  stochastic  methods  used  in  spectral  estimation 
are  presented  in  this  section.  They  are  maximum  likelihood  estimation, 
minimum  predictor  error  variance  estimation,  and  maximum  entropy 
spectral  analysis.  All  three  models  are  considered  as  all-poie  models. 
These  methods  have  no  relationship  with  discrete  Wiener  filtering 
theory.  The  methods  presented  in  this  section  start  with  completely 
different  assumptions  but  finally  yield  the  identical  set  of  either 
covariance  or  autocorrelation  equations  which  characterize  the  system 
to  be  identified  from  the -measured  impulse  response. 

In  these  methods  it  is  assumed  that  the  data  samples  are  part 
of  a  random  process.  The  problem  is  to  choose  the  parameters  of  the 
system  impluse  response  so  as  to  make  the  probability  of  occurence  of 
the  actual  observation  most  likely.  In  other  words,  the  system  para¬ 
meters  are  chosen  in  such  a  way  that  the  probability  density  function 
defining  the  parameters  is  maximized. 


3.1  Maximum  Likelihood  Estimation  Theory 

In  this  approach  the  measured  impulse  response  of  the  system  is 
considered  as  a  segment  of  a  random  process.  It  is  further  assumed 
that  the  impulse  response  can  be  generated  by  passing  an  uncorrelated 
noise  sequence  \e  }  through  an  all-pole  model  of  the  form 
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1 


f  .2 

i-0  1 


Next  the  random  process  \  e_;  is  assumed  to  oe  Gaussran  with 
2  „ 

zero  mean  and  variance  c” .  Thus , 

e 


E\ e  ;  ■  0  and  E{e,e.;  <*  c  c“  (3.2) 

t  i  j  ij  e 

As  the  measured  impulse  response  x^ ,  t  *  0,  1,  K-l  has  been 

assumed  to  be  generated  by  passing  the  noise  sequence  \e  }  through  the 

all-pole  model,  it  follows  that 


M-l 


(3.3) 


From  (3.2)  and  (3.3)  it  is  clear  chat  the  sequence  x^  is  Gaussian  with 
zero  mean  and  a  cross-correlation  defined  by 

]  -  gi_j  .  (3.4) 

This  correlation  sequence  g.  .  would  then  be  a  function  of  the  svstern 

i-J 

2 

parameters  f , ,  i  ■  0,  M-l  and  c  .  Since  x  is  Gaussian,  a 

1  St 

Gaussian  multivariate  probability  density  function  is  defined  for  the 
sequence  of  random  variables  Xq,x^,  ...,  The  maximum  likelihood 

theory  assumes  chat  the  parameter  values  which  make  the  measured  ob¬ 
servation  of  the  impulse  response  most  likely  are  the  same  values 
which  maximize  the  joint  probability  density  function  of  x^,  i  -  0, 

1,  ...,  K-l.  This  can  be  achieved  by  differentiating  the  density 

2 

function  with  each  of  the  unknown  variables,  f,  ,  f„,  ...,  f,.  ,  and  c 

1  2  M— 1  e 

and  then  setting  the  first  partial  derivative  equal  to  zero.  The 
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solutions  of  the  set  of  equations  then  yield  the  values  for  the  unknown 
parameters.  Even  though  the  procedure  is  conceptually  simple,  the  set 
of  equations  becomes  extremely  nonlinear  for  M  greater  than  2  and  no 
exact  solution  for  this  problem  exists  [2]. 

However,  Itakura  and  Saito  [16,17]  solved  the  maximum  likelihood 
problem  by  making  some  additional  assumptions.  First,  the  number  of 
data  points  K  is  made  such  greater  than  M,  the  order  of  the  filter 
(i.e. ,  K  >>  M) .  Second,  the  joint  probaoility  density  function  for 

the  sequence  x  ,x  ,x  ,  . ..,  x„  is  approximated  by 

U  X  J.  X 

p[x0,x1,...,xK_1;  “  [2ttc“]  K/“exp[-a/2ta“]  (3.5) 

where 

K-1+  M  M—l 

a  -  21  fix.  C3-6'-1 

t-0  i-0 

It  is  interesting  to  note  that  a  has  the  identical  form  of  the  error 
defined  by  the  autocorrelation  equations  in  linear  prediction  (see 
equation  (122) ) . 

It  has  been  shown  [16,  17]  tnat  the  results  obtained  for  the  un¬ 
known  filter  coefficients  f  are  identical  to  (2.13)  which  utilizes 
the  autocorrelation  equations. 

The  corresponding  equations  for  the  covariance  method  are  ob¬ 
tained  by  defining  a  conditional  density  function  for  the  probability 
density.  This  is  achieved  by  treating  the  M  data  points  x^,  x  ,  ..., 
x^_^  as  a  set  of  deterministic  initial  conditions  and  the  remaining 

16  F.  Itakura  and  S.  Saito,  "Analysis  Synthesis  Telephony  based  on 
Maximum  Likelihood  Method,"  6th  International  Congress  on  Acoustics, 
Tokyo,  Japan,  Aug.  21-28,  1968,  C-5-5,  pp.  C-17-20. 

17  F.  Itakura  and  S.  Saito,  "Extraction  of  Speech  Parameters  based  upon 
the  Statistical  Method,"  Proc.  Speech  Info.  Process,  Tohaku  Univers 
Sendai,  Japan,  5.1,  5.12  (1971)  (in  Japanese). 
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K.-M  data  points  as  a  set  of  random  variables.  Under  the  above  assump¬ 


tions,  the  conditional  probability  density  function  is  approximated  as 
[17,  2] 

Pc \ ‘ ,XK-1^  : (xo,xl’*** ,XM-1^  '  “ 

(27ra  2)‘0,5(K“Mdxp[-a/2c2]  (3.7) 

e  e 

where 

K-l  M-l  , 

a  -  )  [  T  f  .x  (3.8) 

-  l  t— i 

t-M  i-0 

Again  it  is  clear  that  a  has  the  same  form  as  the  error  energy  defined 
for  the  covariance  equations  for  the  case  of  linear  prediction  (see 
(2.22)).  Maximization  of  the  conditional  probability  density  func¬ 
tion  is  then  achieved  by  maximizing  p  with  respect  to  the  unknown 

c 

2 

filter  coefficients  fi  and  0  .  The  set  of  equations  (see  (2.10)) 
obtained  in  this  case  are  identical  to  those  of  the  covariance  method. 


3.2  Minimum  Predictor  Error  Variance 

In  this  method  the  data  samples  are  not  considered  as  a  part  of 
a  Gaussian  process.  In  other  words,  the  method  remains  the  same  as 
before,  i.e.  the  measured  impulse  response  is  generated  by  passing  a 
noise  sequence  {e^}  through  an  all-pole  model  [2].  It  is  assumed 
that  the  error  sequence  is  of  zero  mean  and  of  variance  given  by 


E[e*] 


M 

l 


M 

I 


i-0  j-0 


f  .f . 
i  J 


E[xt-ixt-j] 


(3.9) 


Next  the  process  is  assumed  to  be  stationary  so  that  the  expectation 
in  (3.9)  can  be  expressed  as 
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E[x  .x  ]  -  g . 
t-l  C-'’  ’ 


(3.10) 


and  the  error  variance  as 


EfO  -  I  I 

i-0  j-0  J  3 


(3.11) 


The  problem  is  to  determine  the  filter  coefficients  so  as  to  minimize 
the  error  variance. 

An  additional  assumption  is  now  made.  Specifically,  it  is  assumed 
that  the  process  is  ergodic  so  that  the  ensemble  average  E  may  be 
converted  to  a  time  average.  Hence,  the  approximation 

K-l 


g 


‘  K  -  K  t*M 


'ij 


(3.12) 


leads  to  the  covariance  equations  (2.10)  and  the  approximation 


«i-j  *  7  /•  x-x 


t-o 


t  t+ji-j 


ri-j 


(3.13) 


leads  to  the  autocorrelation  equations  (2.13).  Hence,  this  method 
yields  a  set  of  analysis  equations  identical  to  those  for  the  dis¬ 
crete  Wiener  filter. 


3,3  Maximum  Entropy  Spectral  Analysis 

An  important  aspect  of  time  series  analysis  is  the  computation 
of  the  power  spectral  density  which  is  primarily  determined  by  the 
second  order  statistics.  In  an  actual  experiment,  the  number  of 
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r 


T 


data  points  is  always  finite.  Hence,  for  the  problem  of  interest  the 
data  length  may  not  be  sufficient  to  obtain  a  specified  degree  of 
frequency  resolution.  Also,  given  a  finite  number  of  K  data  points, 
we  can  obtain  at  most  approximations  of  the  K  autocorrelation  func¬ 
tions  rA,  rn  ,  . . . ,  r_  In  the  previous  autocorrelation  formulations 

the  data  has  been  assumed  to  be  zero  outside  the  known  interval.  In 
some  instances,  this  may  be  an  unreasonable  assumption  about  the  extension 
of  the  data  beyond  the  known  interval.  The  question  then  arises  as  tc  what 
assumptions  should  be  made  about  the  data  outside  the  finite  sample 
and  what  assumption  should  be  made  about  their  second  order  statis¬ 
tics  (i.e.  the  autocorrelation),  since  they  determine  the  power 
spectral  density. 

Burg  proposed  an  information  theory  approach  to  the  problem.  He 
suggested  [18]  that  the  most  reasonable  choice'  of  the  unknown  auto¬ 
correlations  Is  the  one  which  adds  no  information  or  adds  most 
randomness  or  maximizes  the  entropy.  He  then  proceeded  to  select  the 
power  spectral  density  having  the  maximum  entropy  of  all  possible 
spectra  that  agrees  with  the  known  values  of  the  autocorrelation 
function  r  . 

The  information  content  of  a  random  process  is  defined  in  terms 
of  a  quantity  called  entropy  and  is  mathematically  expressed  as 


H  ■  -  [  Pin  P, 
1  j  J 


C3.14) 


where  P^  is  the  probability  of  the  jth  event  of  a  random  process. 


18  J.P.  Burg,  "Maximum  Entropy  Spectral  Analysis,"  Ph.D.  Thesis, 
Stanford  University,  Palo  Alto,  California  1975. 


When  the  random  variable  takes  on  a  continuum  of  values,  the  sum  in 
the  definition  of  the  entropy  is  replaced  by  an  integral.  Since 
we  are  dealing  with  a  time  series  Xq,  x^,  x^  ^ ,  the  probability 

is  replaced  by  the  joint  probability  density  function  p(x^,x1 ,  ....  x^  ^) 
Thus 


H 


p(xQ,x1,  {p(xQ,x1, 


xK_1>dV  (3.15) 


where  dV  is  an  element  of  volume  in  the  space  spanned  by  the  random 
variables.  Burg  then  proceeded  to  adjoin  a  hypothetical  variable  x^ 
to  the  available  estimates  of  the  autocorrelation  function  r^,  r  , 
r^  and  so  on.  We  may  then  consider  the  joint  probability  density 
available  for  the  K  data  points  and  the  adjoined  x^  as 

P(*0.xr  ...,  x^^jX^)  (3.16) 


This  probability  density  function  has  an  entropy 


H 


,x 


1’ 


,xK_1,xK)ln{p(x0,x1, • •.xK_1,xK)JdV  (3.17) 


Burg  chose  as  (3.16)  that  probability  density  function  which  has  its 
first  K  second  order  moments  as  r  ,r  ,  ...,  r  ,  and  which  under 
the  given  constraint  maximized  (3.17).  The  obvious  choice  for  the 
probability  density  function  in  (3.16)  is  Gaussian  since  according 
to  Shannon  and  Weaver  [19,  20]  the  Gaussian  distribution  results  in 
maximum  entropy  under  a  constant  energy  constraint.  Thus 

19  C.E.  Shannon  and  W.  Weaver,  The  Mathematical  Theory  of  Communica¬ 
tion.  Urbana,  Illinois:  University  of  Illinois  Press,  1962,  pp.  56-5 

20  R.N.  McDonough,  "Maximum-entropy  spatiaL  processing  of  array  data," 
Geophysics,  Vol.  39,  No.  6,  December  1974,  pp.  843-851. 
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(3.18) 
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-1., 


P  (  Xq  >  x^  f  •  * « , x^.^  ^  )  " 


exp(-  y  X'  [R^]  X.' 


r  n 


K+l 


C2i r)  ~  detlRjJ 


,1/2 


where  X  is  the  column  vector  of  the  x„,  the  prime  indicates  the  trans¬ 


pose,  and  the  matrix  [R^]  is  given  by 

r  _ 


IV  - 


rK-l 


K.-1  R 


"K-2 


The  entropy  can  then  be  expressed  as  [l9 ,  20 ] 
H  -  y  £n{  (2Tre)K+1  detlR^]} 


■R-l 


(3.19) 


(3.20) 


Now  rR  is  to  be  chosen  in  such  a  way  that  H  in  (3.20)  is  maximized. 
Hence  the  value  of  is  the  one  which  maximizes  det  [R^]. 

In  order  for  r^  to  constitute  a  proper  set  of  autocorrelation 
values,  the  matrix  [R^]  must  be  positive  semi-definite  [21],  More¬ 
over,  det  [R  ]  is  a  quadratic  function  in  r  .  It  follows  that 
K  K 

maximizing  det  [R^j  with  respect  to  r^  yields  the  value  of  rR  obtained 
from  the  solution  of  the  following  equation 


det 


r0  -  rK-2 


r2  ri 


K-3 


rK-l -  rl 


-  0 


(3.21) 


21  A.  Van  den  Bos,  "Alternative  Interpretation  of  Maximum  Entropy 

Spectral  Analysis,"  IEEE  Trans,  on  Information  Theory,  Vol.  IT-17, 
No.  4,  1971,  pp.  493-494. 
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A  Discussion  of  Various  Approaches  to  the  Identiflcation/Approximation  Problem 
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Abstract — Tbc  ideatifkatioa/approximation  of  a  linear  system  by 
poles  and  residues  from  a  measured  finite  length  inpul-output  record 
of  the  system  is  discussed.  The  objective  of  this  paper  is  to  illustrate 
that  aeveral  different  formulations  for  characterizing  the  impulse 
response  of  a  system  yield  the  same  set  of  poles. 

1.  INTRODUCTION 

N  RECENT  years  the  singularity  expansion  method  (SEM) 
has  been  used  very  successfully  to  study  transient  phenom¬ 
ena  in  electromagnetic  radiation  and  scattering  problems. 
With  this  approach,  information  is  obtained  about  the  electro¬ 
magnetic  structures  from  measured  transient  responses  to 
known  inputs.  The  information  leads  to  a  characterization 
of  the  impulse  response  of  the  electromagnetic  system  by  a 
sum  of  damped  exponentials.  It  is  desirable  to  know  the 
complex  natural  frequencies  or  poles  with  a  high  degree  of 
accuracy.  The  problem  of  extraction  of  the  poles  from  the 
measured  transient  response  data  is  reduced  to  a  system  ap¬ 
proximation/identification  problem . 

This  paper  deals  with  the  identificaiion/approximation  of 
a  linear  system  by  poles  and  residues  from  a  measured  finite 
length  input-output  record  of  the  system.  The  objective  of 
this  paper  is  to  illustrate  that  several  different  formulations 
for  characterizing  the  impulse  response  of  a  system  yield  the 
same  set  of  poles. 

Recognition  that  different  formulations  yield  the  same 
poles  is  not  widely  appreciated  It  is  shown  here  that  formula¬ 
tions  based  upon  different  assumptions  result  in  identical 
sets  of  analyst-  equations  For  example,  it  is  not  at  all  ob¬ 
vious  that  Piony's  method  (as  derived  by  most  numerical 
analysis  formulations)  may  be  interpreted  in  terms  of  pre¬ 
dicting  each  value  of  data  and,  therefore,  is  a  form  of  digital 
Wiener  filter.  Markel  [2i  did  recognize  the  fact  that  identical 
analysis  equations  can  be  obtained  by  several  different  tech¬ 
niques. 

In  our  discussion  of  the  various  approaches,  only  references 
which  are  directly  relevant  are  noted.  No  attempt  has  been 
made  to  cite  the  earliest  sources.  In  many  cases,  additional 
references  may  be  found  in  papers  mentioned. 

The  problem  of  interest  is  to  identify/approximate  the 
transfer  function  of  a  system  by  its  poles  and  residues  when 
the  noise-contaminated  input  and  output  are  specified.  The 
signal  and  noise  are  considered  to  be  stationary  processes. 
When  time-limited  signals  are  involved,  the  problem  is  con- 
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verted  to  an  equivalent  stationary  problem  by  convolving 
the  time  limited  signal  with  white  noise  of  unit  power  [  3  ] 
In  the  second  section  the  classical  method  for  extracting  the 
signal  from  noise  ts  discussed  This  is  the  Wiener-Kolmogoroff 
theory  |4).  The  digital  form  of  the  Wiener-Mopf  equation 
is  derived.  Topics  associated  with  Wiener  filters  are  also 
presented  They  include  inverse  filter  design,  linear  prediction, 
predictive  deconvolution  (or  spiking)  filter  design,  recursive 
filter  design,  and  Kalman  filtering  Both  the  popularly  known 
covariance  and  autocorrelation  methods  are  derived  from  the 
Wiener  filter  theory. 

In  the  third  section  the  various  stochastic  extensions  are 
described.  They  include  the  maximum  likelihood  estimation 
theory,  the  minimum  predictor  error  variance  and  the  maxi¬ 
mum  entropy  spectral  analysis  It  is  demonstrated  that  iden¬ 
tical  analysis  equations  for  parametric  modeling  of  the  system 
c  i  be  obtained 

The  fourth  section  provides  Prony’s  method  in  various 
forms  In  particular,  when  a  semi-least-squares  approach  is 
applied  to  Prony’s  method,  both  the  autocorrelation  and  the 
covariance  method  appear  as  special  cases.  Thus,  it  is  also 
a  form  of  a  digital  Wiener  filter. 

II  WIENER  FILTER  THEORY 

Kolmogoroff  in  1942  and  Wiener  in  1943  were  the  first 
to  present  a  unified  theory  on  extrapolation,  interpolation, 
and  smoothing  of  stationary  time  series.  The  linear  filter 
which  performs  the  desired  task  is  obtained  by  the  solution 
of  an  integral  equation  known  as  the  Wiener-Hopf  equation 
(4).  For  sampled  data  systems,  the  integral  form  of  the 
Wiener-Hopf  equations  reduces  to  a  finite  sum.  The  present 
treatment  describes  how  Wiener’s  concepts  can  be  applied 
to  the  identification/approximation  of  linear  systems.  The 
basic  model  for  this  process  consists  of  an  input  signal,  a 
desired  output  signal,  and  an  actual  output  signal.  If  one 
minimizes  the  mean-squared  error  between  the  desired  out¬ 
put  signal  and  the  actual  output  signal,  it  becomes  possible 
to  solve  for  the  optimum  system  commonly  known  as  the 
“Wiener”  filter.  The  fundamental  assumption  underlying  the 
procedure  is  that  all  processes  are  stationary. 

A  stationary  time  series  is  one  whose  statistical  properties 
are  time  invariant.  In  particular,  the  statistics  of  the  time 
series  are  independent  of  time.  By  definition,  a  stationary 
time  series  must  be  of  infinite  duration.  However,  in  an 
actual  experiment,  we  observe  a  times  series  over  a  finite 
interval.  In  order  to  apply  the  concepts  of  Wiener  filtering, 
the  finite  length  time  series  is  convolved  with  a  white  noise 
series  of  unit  power,  to  yield  a  stationary  time  series  [3], 
Moreover,  in  actual  measurements  only  one  waveform  is 
often  available  for  computing  various  statistics.  Thus  ensemble 
averages  are  frequently  evaluated  by  means  of  appropriate 
time  averages.  This  is  valid  only  when  the  random  processes 
are  ergodic. 
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So^o  Oo’o 


j  _ l'mJ 


Output  S'gno1 
dt 


In  order  to  solve  the  above  equations,  it  is  necessary  to 
compute  the  expected  values  in  (5).  By  assuming  that  the 
ensemble  averages  can  be  evaluated  in  terms  of  time  averages, 
one  obtains  13] 


-T*t-i}=  2  xt-rxt-i  =  CT/ 

*  —  M  t=M 

(in  the  covariance  method) 


0,  I  2  .  <•«-? 

Fig.  1 .  Principles  of  Wiener  filtering 

Although  the  concept  of  Wiener  filtering  is  well-known,  it  is 
included  here  for  completeness.  The  fundamental  elements  of 
digital  Wiener  filtering  are  summarized  in  Fig,  1  for  the  sam¬ 
pled  data  problem  Given  the  input  sequence  x,  and  a  desired 
output  sequence  dt,  the  problem  is  to  find  the  linear  filter 
coefficients  /,,  whose  output  sequence  *,•/,  (the  asterisk 
denotes  convolution)  yields  a  minimum  mean-squared  error  Similarly 
estimate  of  dt.  If  £{•}  denotes  the  expected  value,  then  the 
enor 


1 

K-l+M 

a! 

K 

j 

_  I 

K 

K  -  1 

2  ; 

r=o 

(in  the  autocorrelation  method). 


I  =  E{(dt~y,)2}  (1) 

is  to  be  minimized. 

In  this  case  an  Af-length  filter/,  =  {/0,/t,  ''./m-i}  con¬ 
verts.  in  a  least-error  energy  sense,  a  A -length  input  x,  = 
{.r0.  x, ,  -  t }  into  a  A.'  +  St  -  1  length  sequence  y,  which 

approximates  the  desired  sequence  d,  =  {d0,di ,  2}. 

The  actual  output  is  obtained  as 


£{</,x, - -  2  d'X,-i 


(in  the  covariance  method) 


K  r-  o 

(in  the  autocorrelation  method). 


yt  = 

M-  1 

=  X,  •  /,  =  2  A-Xf-r 


It  is  important  to  stress  that  the  terms  “covariance”  and 
“autocorrelation”  are  not  based  upon  the  standard  usage  of 
(2)  the  terms  as  occur  in  the  theory  of  stochastic  processes. 
Rather,  we  follow  the  usage  which  is  quite  prevalent  in  the 
speech  processing  literature  [5],  The  following  discussion  is 


The  prob  em  of  making  the  actual  output  y,  as  close  as  pos-  ......  ,  ,  ....  .  ,  „  ... 

...  .  .  .  .  .  ..  ..  ,  intended  to  clarify  their  interpretation. 

sib  e  to  the  desired  output  d,  can  be  interpreted  in  terms  of  ,  ..  .  ..  ...  ...  ........ 

.....  „  It  is  clear  that  the  covanance  definition  given  by  (6)  yields 

minimizing  the  error  energy  .  .  ..  . 

an  unbiased  estimate  smce 

•>-*!(*.- s' <3,  *2' 

\_K  —  M  ,-M 


The  error  is  minimized  by  evaluating  the  partial  derivatives  of 
I  with  respect  to  fT  and  equating  them  to  zero.  This  results  in 
a  set  of  equations 

M  I 

=  -2£{<Vr-/}  +  2  2  /r£{x,_r*r-/} 

T=0 

=  0,  for  /  =  0.  1,  2.  ■,  M-  1  (4) 

The  unknown  filter  coefficients  are  obtained  by  solving  the 
following  set  of  simultaneous  equations.  This  is  the  discrete 
Wiener-Hopf  equation. 

M  i 

frE(x,-  T*!-/}  =  £{<ffX»-/} 


1  1 

=  7 - 77  2 

K  —  M  t=M 

=  E  {x  t-rx  t-  j\- 

On  the  other  hand,  the  autocorrelation  definition  given  by 
(7)  results  in  a  biased  estimate  since 

"j  k-l+M 

^{xr-T-Xr-./}!  =E  ~  2  x,_Tx,_, 

f=  0 

2  K  — !  r — / 1  -  1 

A  0 


=  E{X, 


for /  =  0.  I.  -,  M-  I. 


(5)  Since  (r  -  /)  <  M,  it  is  interesting  to  note  that  the  bias  is 
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negligible  when  Af  (the  order  of  the  filter)  is  much  less  than 
A  The  bias  is  significant,  however,  when  A  is  only  slightly 
larger  than  Af.  This  explains  why  a  large  number  of  data 
points  (A  >  Af)  is  necessary  for  unbiased  spectral  estimation 
when  using  the  autocorrelation  function  to  obtain  the  power 
spectral  density. 

Because  the  covariance  method  gives  an  unbiased  estimate, 
it  might  be  assumed  that  it  is  a  better  estimate.  However,  the 
biased  estimate  provided  by  the  autocorrelation  method  is 
often  preferable  As  an  example  consider  the  zero  mean  four 
data  point  sequence  (4.  -2,  -1,  -1)  and  Af  =  2.0.  Using  the 
unbiased  estimator,  the  expected  values  are  C00'  =  1.0, 
C,0’  -  1.5.  C2q  =  -1.0,  f'jo  =  -2.0.  In  contrast,  the 

biased  estimator  results  in  r(0)  =  5.5,  r(l)  =  -1.25,  = 

-0.5.  3)  =  -1,0,  Note  that  C00'  is  less  than  C10'  whereas 

e(o)  is  guaranteed  to  be  greater  than  r(z)  for  t  >  0.  Hence,  the 
autocorrelation  method  yields  nonnegative  power  spectral 
densities  whereas  the  covariance  method  may  not. 

Continuing  with  our  development,  the  discrete  Wiener- 
Hopf  equation  presented  in  (5)  can  be  written  in  the  follow¬ 
ing  matrix  form  for  the  covariance  method: 


transformed  to  an  impulse  of  area  o  by  an  all-zero  filter 
of  the  form 


Af  l 

F 00=  2  with/0  =  1. 

/=o 


In  terms  of  Fig  1,  the  desired  waveform  d ,  is  an  impulse 
of  area  o.  It  follows  that  the  coefficients  of  the  filter  should 
be  chosen  such  that 


Af  -  1 

2  Uxt-r  =  o6, 
T—  0 


(IM 


where  6,  =  1  for  ;  =  0  and  zero  otherwise.  Multiplication  of 
both  sides  by  x,_j  and  summing  r  from  k  to  A'  -  1  +  /,  one 
obtains 


Af  t 


2/r 

*=  n 


Alt/ 

2  X!-Txt-, 
f-  k 

= 

0  k  >  0 

OX_j,  k  =  0 


(17) 


(Q/]AfxAf[^ilAfX  I  -(^;]mxi  (*°) 

where  lC,-yl  is  a  square  matrix  whose  elements  are  given  as 

Cii  =  2  b  (I1) 

r=M 

IF,1  is  a  column  matrix  consisting  of  the  Af  unknown  filter 
coefficients  /0.  ft,  (,  and  [0/1  is  a  column  matrix 

whose  elements  are  given  by 

A-t 

D,  =  2  (12) 

r-  m 


Since  .r  ■  is  zeio  for  /  >  0,  the  unknown  coefficients  fr  for 
r  =  1,2,  •••,  Af  -  1  are  obtained  from  the  solution  of  the 
following  equations: 


mi  r*~  i  ♦/ 

f T  Xt~TXt~  j 

r  =  0  L  t-k 


=  0 


(IS) 


for  /  =  1,2,  — ,  \l  -  1 .  Fur  k  =  Af  and  /  =  0,  the  above  equa¬ 
tions  reduce  to  that  of  the  covariance  equations  as  in  (10). 
This  is  because 


K-  1 

Dj=  2  btxt-)  =  0. 
t=M 


(19) 


For  the  autocorrelation  method,  the  unknown  filter  coef¬ 
ficients  are  obtained  from  the  solution  of  the  matrix  equation 

[-^|i-/|] Af x Af i  =  [29/] AfX  l  (13) 

where  [A  i,_;i)  is  a  square  matrix  whose  elements  are  given  as 
K  -  1  +Af  A-t -If- /I 

^|f-/l  =  2  xt-ixt-i~  ^  xrxr+li-/l  d^) 

o=0  r=0 

and  [Dj]  is  a  column  matrix  defined  by  1121. 

Interestingly,  most  of  the  formulations  for  the  solution  of 
an  unknown  linear  filter  lead  to  analysis  equations  which  can 
be  formulated  either  in  terms  of  the  autocorrelation  matrix 
equations  (13)  or  in  terms  of  the  covariance  matrix  equations 
(10) 

It  is  important  to  note  that  the  Wiener  filter  is  not  always 
realizable  as  a  causal  rational  function  (in  terms  of  poles  and 
zeros).  As  a  result,  the  Wiener  filter  is,  in  general,  an  infinite 
order  filter.  When  the  order  of  the  filter  is  specified  a  priori, 
the  resulting  filter  may  no  longer  be  optimum. 

Various  forms  of  the  Wiener  filter  have  appeared  under  dif¬ 
ferent  names  and  have  been  used  in  various  geophysical, 
speech  processing,  and  digital  signal  processing  applications. 
Next,  various  modifications  of  the  Wiener  filter  are  presented. 

A  Inverse  Filter  Formulation 

The  inverse  filter  attempts  to  transform  the  input  signal 
into  an  impulse  (6).  Assume  that  the  input  sequence  x,  is 


For  k  =  0  and  /  =  M,  (18)  reduces  to  that  of  the  autocor¬ 
relation  equations  If  the  input  signal  is  approximated  by  an 
all-pole  model,  the  poles  of  the  input  signal  are  obtained  from 
the  zeros  of  F(z).  i.e.,  from  the  solution  of  the  polynomial 
equations 

Af-  1 

2/rr~T  =  0.  (20) 

T=0 

In  particular,  if  (z~')i  is  the  ith  root  of  the  above  equation 
(20),  then  the  ith  pole  is  equal  to  In  [(z_  1 )() . 

B.  Linear  Prediction 

The  term  "linear  prediction”  was  first  used  by  Wiener  in 
his  classic  work  on  prediction  of  stationary  time  series.  Since 
its  publication,  it  has  found  wide  application  in  the  determina¬ 
tion  of  all-pole  models  for  the  processing  of  speech  signals 
15). 

The  basic  philosophy  here  is  to  take  a  part  of  the  sampled 
waveform  (say  the  first  M  -  1  points  from  A  data  points) 
and  predict  the  next  data  point  on  the  waveform  by  proper 
choice  of  the  predictor  coefficients  a,-.  The  linear  predictor 
of  step  size  one  predicts  the  Afth  data  point  of  the  waveform 
when  a  (Af  -  1)  order  predictor  filter  is  chosen.  In  the  time 
domain,  the  predicted  sample  xm  is  given  by 

Af  I 

*Af  =  —  2  ai*M  > 

1 
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where  (-a,,  -a2,  -a3,  —aju- 1)  116  the  predictor  coef¬ 
ficients.  Assuming  the  signal  spectrum  is  to  be  modeled  by 
an  all-pole  model,  the  coefficients  a,  are  the  negative  of  the 
values  of  /,  presented  in  (15).  A  (Af  -  1)  order  linear  pre¬ 
dictor  thus  requires  a  linear  combination  of  the  previous 
(Af  -  1 )  samples.  The  error  is  then  given  by 


eM  ~  XM  x  M 

Af-  1 

=  ~  2  + 

i=  1 


Af-  1 

=  —  2  alxM- i.  with  a0  =  — 1 .  (21) 

l=o 


When  (K  -  1  +  1)  different  data  points  are  predicted,  the  total 
squared  error  is  defined  by 

AT-l+J  JT-lv/FAf-i  ~|2 

/=  2  lfr]2  =22  aixt-l 

r=k  r=k  L  t=o 


Af  -  1  Af  -  1 

=  2  2 


»=o  j- o 


K-  1+f 

2  *t-ixt-j 

t=k 


Af  -  l  Af  -  1 

=  2  2  xi-ixt~i 

1=0  /  =  0  t=k 


(22) 


y,  is  assumed  to  be  a  delayed  impulse  such  that 
Af  -  1 

^  frxt—  t  O&t—  a+ 1  -  (24) 

T  =  0 

The  unknown  filter  coefficients  /T  can  be  pursued  in  a  manner 
analgous  to  the  inverse  filter  approach  by  assuming  the  input 
to  be  represented  as  an  all-pole  model.  However,  we  prefer 
to  show  that  the  filter  coefficients  can  also  be  determined 
by  interpreting  the  problem  as  a  prediction  problem  It  is  in 
this  sense  that  the  term  predictive  deconvolution  is  used  [7] . 

Introduce  the  change  of  variables  t  -  a  +  1  =  0  in  (24). 
The  resulting  equation  is 


Af-  I 

2/  / r x8 + q  —  t -  1  ~  oSff.  (25) 

r=0 

Next,  assume  the  filter  coefficients  to  be  given  by 


o-l  zeros 


(—1, 0,  0,  -,  0,  a  j ,  &2 ,  •  ~a\t-  l)  (26) 

The  upper  limit  on  the  summation  is  now  given  by  Af  +  a  -  2. 
Equation  (25)  can  now  be  written  as 

Af+o-2 

Xf+a~l  frxt*a-T-l 

r=  1 


Minimization  of  1,  with  respect  to  the  set  of  the  filter  coef¬ 
ficients  leads  to  a  set  of  simultaneous  equations  given  by 


M  —  1  K-l+l 

2  ^  2  xt~ixt-i 

1=  0  t=k 


=  0 


(23) 


for  I  —  1,2,  — ,  Af  -  1,  from  which  the  unknown  filter  coef¬ 
ficients  /,  are  obtained.  When  k  —  M  and  /  =  0,  this  amounts 
to  minimization  of  the  error  only  over  data  points  of  the 
waveform  from  Af  to  K  -  1.  When  k  =  0  and  /  =  Af,  this 
implies  minimization  over  the  entire  waveform.  Equation 
(23)  are  identical  to  the  set  of  equations  (19)  obtained  in  the 
case  of  the  inverse  filter  formulation  in  the  previous  section. 

Linear  prediction  is,  therefore,  equivalent  to  an  all-pole 
model  for  the  spectrum  of  the  input  x,.  The  poles  for  this 
model  are  again  obtained  from  the  solution  of  the  polynomial 
equation 


1=0 


Thus,  if  x,  represents  the  measured  impulse  response  of  the 
system,  the  set  of  poles  obtained  by  linear  prediction  can  be 
interpreted  so  as  to  parameterize  the  system  in  terms  of  an 
all-pole  model. 


—  xl+a~  1 


Af-  I 

arxt-r 

T=  1 


=  06,.  (27) 

If  the  estimate  of  xf+0_1  is  assumed  to  be  given  by  x 
then 


M- J 

Xf+O-  I  ^  Xt—T  (28) 

r=  I 

and  oSr  can  be  interpreted  as  the  error  of  the  estimate. 

If  the  total  squared  error  is  minimized,  as  was  done  for  the 
linear  prediction  approach  in  Section  II-B,  a  set  of  equations 
is  obtained  for  the  filter  coefficients.  When  a  =  1,  these  equa¬ 
tions  reduce  to  the  same  equations  as  were  obtained  in  (23). 

D.  Recursive  Filter  Design 

Recursive  filters  are  often  used  in  digital  filter  design  [81- 
{ 1 2 1 .  With  respect  to  Fig.  1,  the  spectral  estimation  problem 
is  now  posed  in  the  following  manner.  Assume  the  filter  in¬ 
put  is  the  impulse  5,.  Let  the  filter  impulse  response  be  de¬ 
termined  so  as  to  approximate  the  data  sequence  in  a  mini¬ 
mum  mean  squared  error  sense.  If  x,  is  the  input  to  the  filter 
and  yt  is  the  output,  then  they  are  related  by 


C.  Predictive  Deconvolution  or  Spiking  Filter  Design 

The  general  linear  filtering  problem  involves  the  input 
jrf,  the  impulse  response  h,  and  the  output  yt.  When  it  is 
desirable  to  evaluate  h,  given  x,  and  yt,  the  problem  is  referred 
(o  as  deconvolution.  In  this  sense  the  inverse  filter  problem 
discussed  in  Section  II- A  is  a  deconvolution  problem.  Pre¬ 
dictive  deconvolution  refers  to  the  case  in  which  the  output 


Af-  I  £- I 

2  akyt-k  =  2  brxt-r 
k=0  r=0 

Application  of  the  z-transform  to  both  sides  yields 
Af  -  1  £-1 

2  <U*-*y(r)=  2  ftr*~'*(z) 

k=0  r=  0 


SARKAR  et  at:  IDENTIFICATION/ APPROXIMATION  PROBLEMS 


93 


or 


As  bt  =  0  for  r  >  L  (from  (33)),  one  may  write 


jr,  v  -  '1  =  *!> 

X(!)  _k  A(z) 

Zs  a*z 

k-o 


(29) 


Therefore,  a  recursive  filter  has  a  transfer  function  which 
is  expressed  as  a  ratio  of  two  polynomials  of  the  /-transform 
variable  The  objective  here  is  to  synthesize  the  filter  from  the 
given  impulse  response  of  the  system.  It  is  important  to  note 
that,  unlike  the  previous  three  types  of  filters,  this  is  not  an 
all-pole  filter  but  a  pole-zero  model.  In  other  words,  given 
some  desired  filter  operator  £>(z)  (which  is  the  transfer  func¬ 
tion  of  a  desired  system  having  the  impulse  response  dt), 
we  require  the  polynomials  A(z)  and  B(z)  of  the  filter 


F(z)  = 


A(z) 


(30) 


Af-  1 

/,  =  —  ^  .  for  t>L.  (34) 

/=! 

By  assuming  the  approximation  in  (31)  ts  valid,  the  coef¬ 
ficients  /,  approximate  the  coefficients  d,  for  I  >  1.  Equa¬ 
tion  (34)  may  then  be  approximated  by 


M~  1 

d,  3:  —  2  ajdt-j,  for  r  =  L  +  1 ,  L  +  2,  .A'  —  1 

/=t 


(35) 


The  error  in  (35)  is  given  by 


Af-1 

e,  =  d,+  2«/d,-/ 
1=1 


such  that 

F(z)  *  £>(z)  =  d0  4 ■  d1z~1  +  d2z-2  4 - _  ,z~*  +1 . 

(31) 

A  technique  for  determining  A(z )  and  B(z)  is  outlined 
next. 

A(z)=  1  +axz~l  +  a2z~ 2  +  -  +  aM_  tz~M  * 1  (32) 

and 

B(z)  =  60  +  b,z~  1  +  h2z-2  +  - +bL_xz~L*'  (33) 


Af-  t 

=  2 


a/d,_j,  since  a0 


1, 


forr  =  I  +  \,L  +  2,  •••,  K~  I.  (36) 

The  aj  coefficients  are  chosen  in  such  a  way  that  the  mean- 
squared  error  is  minimized.  In  particular, 


K  ~  1  K  -  1 

2  2 

r*=i  +  i  r=L  +  t 


AT  -  I 

X  a/dr-/ 

L  f-o 


(37) 


is  minimized.  By  differentiating  /  of  (37)  with  respect  to 
a i  and  equating  the  derivatives  to  zero,  a  set  of  simultaneous 
equations  is  obtained.  They  are  given  by 


where  M  and  L  are  arbitrary  numbers  which  fix  the  number  of 
poles  and  zeros,  respectively,  for  the  filter.  Also,  divide  B(z ) 
by  A{z)  so  as  to  obtain  the  infinite  series 


=  0, 


fY*)  =/o+ /iz"‘+ /zz~  2 + 


for  k  =  1,  2,  -,M-  1. 


(38) 


Since  F(z)A(z)  =  B(z),  it  follows  that 
F(z){l  +  a,r_l  +  a2z~2  +  -  +  <2M_,z-M+1} 
=  b0  +  blz~l  +  b2z~  2  +  •••  +  *£, -tz_i+l' 


For  a  realizable  filter  the  numerator  polynomial  is  generally 
one  degree  lower  than  the  denominator  polynomial  (if  the 
poles  are  simple).  Thus 

L  +  1  =  M. 


Since  multiplication  of  z -transforms  is  equivalent  to  convolu¬ 
tion  of  the  discrete  time  series,  the  series  of  bt  coefficients 
is  equal  to  the  convolution  of  the  f,  coefficients  with  the  a, 
coefficients. Or,  equivalently, 

M-i 

b,=  X 

i*  o 

By  assumption,  a0  =  1 .  Therefore, 

Af-I 
i=  t 


Hence,  (38)  reduces  to 
m-\  Tat  - 1 

2  a/  £  d,_jd,_k 

/= 0  L  r=Af 

which  is  equivalent  to  the  covariance  equations  as  given  by 
( 1 0).  The  right  side  of  ( 1 0)  is  zero  because 

K  -  1 

Dj  =  2  8f-i  =  0,  for/=  1,  2,  — ,  M—  1. 

The  poles  for  the  filter  are  then  obtained  by  the  solution  of 


=  0 


(39) 
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the  polynomial  equation 


estimate  j„  is  orthogonal  to  the  measured  data,  or 


M  1 

2 


i-  0 


a,  i 


t 


=  0. 


F[(s„  -  sn)d,r]  =  0,  for  i  =  0,  1 .  ,  M  —  1 .  (43) 

Expressed  in  a  different  way 


It  was  shown  previously  that  the  inverse  filter,  assuming  an 
all-pole  model  for  the  signal  spectrum,  is  also  equivalent  to 
(10)  We  conclude,  therefore,  that  the  poles  for  an  all-pole 
model  correspond  to  the  identical  set  of  poles  for  a  pole-zero 
model  for  the  same  order  filter. 

Next  the  residues  at  the  poles  (or  equivalently,  the  numera¬ 
tor  polynomial)  can  be  obtained  by  minimizing  the  mean- 
squared  error  given  by  S({/r  -  d,}2 . 

It  is  interesting  to  note  that  when  the  numerator  polynomial 
is  realized  directly  in  the  form  presented  in  (33),  the  problem 
reduces  to  the  case  of  the  Pade  approximation  (131.  Mathe¬ 
matically,  Pade  approximation  results  in  an  approximation  of 
Diz  )  by  Fu)  such  that  the  seminorm 

II  D(:)-F(z)  II  =  tf>(l)-F(l)|  +  U>'(l)-F'(l)| 

+  -  +lDL+M-i(l)-FL*M~  1  (I ) |  (40) 

is  made  zero.  Here  D*(  1)  represent  the  Jtth  derivative  of  Diz) 
evaluated  on  the  unit  circle. 

£.  Kalman  Filter  Theory 

Underlying  Wiener  filter  design  is  the  so-called  Wiener- 
Hopf  integral  equation,  its  solution  through  spectral  factoriza¬ 
tion.  and  the  practical  problem  of  synthesizing  the  theoreti¬ 
cally  optimal  filter  from  its  impulse  response.  The  normal 
Wiener  filter  is  derived  from  the  Wiener-Hopf  equation  and,  in 
general,  this  equation  can  be  solved  only  in  the  steady  state, 
i.e . ,  when  the  observation  interval  is  semi-infinite.  The  contri¬ 
bution  of  Kalman  was  recognition  of  the  fact  that  the  integral 
equation  could  be  converted  into  a  nonlinear  differential  equa¬ 
tion  whose  solution  contains  all  the  necessary  information  for 
design  of  the  optimal  filter.  The  problem  of  spectra!  factoriza¬ 
tion  in  the  Wiener  filter  is  analagous  to  the  requirement  for 
solving  M1M  +  1  )/2  coupled  nonlinear  algebraic  equations  in 
the  lf-order  Kalman  filter.  These  equations  can  be  solved 
numerically  for  transient  type  problems,  where  data  is  avail¬ 
able  only  for  a  finite  interval  This,  in  general,  results  in  the 
Kalman  filters  being  time  variant.  However,  in  the  steady- 
state  the  Kalman  filter  reduces  to  the  time  invariant  Wiener 
filter  [14]  The  presentation  by  Sorensen  (15}  expresses  the 
results  of  Kalman  filter  theory  in  a  way  that  makes  this  com¬ 
parison  easier 

The  problem  involves  estimating  a  signal  {r„},  from  meas¬ 
ured  data  f  J0  d, ,  ",  d g  _  , }.  If  the  estimate  is  computed  as 
a  linear  combination  of  the  d„,  then 


M-  1 

FfSnd/^I  =  Y.  A  iFfd.d.7*!.  for  i  =  0,  1 .  ",  M  —  1. 
i=o 

(44) 

However,  the  matrix  inversion  that  is  required  becomes  com¬ 
putationally  impractical  when  M  is  large,  Wiener  and 
Kolmogoroff  assumed  an  infinite  amount  of  data  (that  is, 
the  lower  limit  of  the  summation  is  -<*>  rather  than  zero). 
Equation  (43)  is  then  the  discrete  form  of  the  W'iener-Hopf 
equation  which  was  solved  using  spectral  factorization 

The  basic  difference  between  Wiener-iColmogoroff  theory 
and  the  Kalman  filter  theory  is  how  equation  (441  is  solved. 
In  19S5,  J.  W.  Follin  suggested  a  recursive  approach  to  solve 
(44)  It  is  clear  (see  15.  p.  65)  that  Follin’s  work  provided  a 
direct  stimulus  for  the  work  of  Richard  Bucy.  which  led  to 
his  subsequent  collaboration  with  Kalman  in  the  total  de¬ 
velopment  of  the  "state  space"  approach  for  obtaining  the 
filter  equations. 

F.  Summary 

As  outlined  above,  all  forms  of  the  digital  Wiener  filter 
lead  either  to  the  covariance  or  the  autocorrelation  equations. 
It  is  also  interesting  that  the  same  set  of  poles  is  obtained 
whether  one  models  the  signal  as  an  all-pole  model  or  as  a 
pole-zero  model. 

Ill  STOCHASTIC  METHODS  APPLIED  TO 
SYSTEM  IDENTIFICATION 

Three  different  stochastic  methods  used  in  spectral  estima¬ 
tion  are  presented  in  this  section.  They  are  maximum  likelihood 
estimation,  minimum  predictor  error  variance  estimation,  and 
maximum  entropy  spectral  analysis.  All  three  models  are  con¬ 
sidered  as  all-pole  models.  These  methods  have  no  relation¬ 
ship  with  discrete  Wiener  filtering  theory.  The  methods 
presented  in  this  section  start  with  completely  different  as¬ 
sumptions  but  finally  yield  the  identical  set  of  either  co- 
variance  or  autocorrelation  equations  which  characterize  the 
system  to  be  identified  from  the  measured  impulse  response. 

In  these  methods  it  is  assumed  that  the  data  samples  are 
part  of  a  random  process.  The  problem  is  to  choose  the 
parameters  of  the  system  impulse  response  so  as  to  make  the 
probability  of  occurence  of  the  actual  observation  most 
likely.  In-  othet;  words,  the  system  parameters  are  chosen  in 
such  a  way  that  the  probability  density  function  defining  the 
parameters  is  r  axir.iized. 


M  I 

in  =  2  Aid>  <41> 

(=0 

The  M  coefficients  A,  are  chosen  in  such  a  way  that  the  mean- 
squared  error. 


A.  Maximum  Likelihood  Estimation  Theory 

In  this  approach  the  measured  impulse  response  of  the 
system  is  considered  as  a  segment  of  a  random  process  It  is 
further  assumed  that  the  impulse  response  can  be  generated 
by  passing  an  uncorrelated  noise  sequence  {e(j  through  an 
all-pole  model  of  the  form 


/  =  F((sn-sn)T(Sn-in)],  (42) 

is  minimized.  Here  T  denotes  the  transpose  of  the  row  vector 
(s„  -  s„)  This  criterion  is  satisfied  when  the  error  in  the 


1 

F(z) 


I 


fa  ‘ 

t-0 


M- 

2 


with  /0  =  1 


(45) 
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Next  the  random  process  {<■,}  is  assumed  to  be  Gaussian 
with  zero  mean  and  variance  o,2.  Thus, 

E{e, }  =  0  and  A'{e,e;}  =  6,yo,,2 .  (46) 

As  the  measured  impulse  response  x,.  t  =  0,  1,  ,  A  -  1 

has  been  assumed  to  be  generated  by  passing  the  noise  se¬ 
quence  {er;  through  the  all-pole  model,  it  follows  that 

M-  1 

2  fixt-  i  =  et  <47) 

From  (46)  and  (47)  it  is  clear  that  the  sequence  x,  is  Gaussian 
with  zero  mean  and  a  cross  correlation  defined  by 

£(*/*/]  =*/-/■  (48) 

This  correlation  sequence  g,_;  would  then  be  a  function  of 
the  system  parameters  i  =  0.  1,  Af  -  1  and  oe2  Since 
x,  is  Gaussian,  a  Gaussian  multivariate  probability  density 
function  is  defined  for  the  sequence  of  random  variables 
x0,  jt|,  xa  -  t  The  maximum  likelihood  theory  assumes 
that  the  parameter  values  which  make  the  measured  observa¬ 
tion  of  the  impulse  response  most  likely  are  the  same  values 
which  maximize  the  joint  probability  density  function  of 
x,,  i  =  0,  1,  K  -  1.  This  can  be  achieved  by  differentiating 
the  density  function  with  each  of  the  unknown  variables, 
/, ,  f2,  ■■■,  fsi-  i  and  ae 2  and  then  setting  the  first  partial  de¬ 
rivative  equal  to  zero.  The  solutions  of  the  set  of  equations 
then  yield  the  values  for  the  unknown  parameters  Even 
though  the  procedure  is  c  nceptually  simple,  the  set  of  equa¬ 
tions  becomes  extremely  nonlinear  for  M  greater  than  2  and 
no  exact  solution  for  this  problem  exists  (2) . 

However.  Itakura  and  Saito  [161,  1 1 7 1  solved  the  maxi¬ 
mum  likelihood  problem  by  making  some  additional  assump¬ 
tions  First,  the  number  of  data  points  A  is  made  much  greater 
than  Af,  the  order  of  the  filter  (ie.A  >  M).  Second,  the  joint 
probability  density  function  for  the  sequence  x0.  x,.  x2.  —, 
xg  _  i  is  approximated  by 

p{x0,x,,  ■,  xK  _  ,}  =  [  2noe2  ]  */2  exp  [-a/2of2]  (49) 

where 


where 

k  i  r*f  i  "]  2 

9=2  X  i  (52 1 

r=  At  L  i=  o 

Again  it  is  clear  that  Or  has  the  same  form  as  the  error  energy 
defined  for  the  covariance  equations  for  the  case  of  linear 
prediction  (see  (221)  Maximization  of  the  conditional  prob¬ 
ability  density  function  is  then  achieved  by  maximizing  pt 
with  respect  to  the  unknown  filter  coefficients  /,  and  or2 
The  set  of  equations  (see  ilOl)  obtained  in  this  case  are 
identical  to  those  of  the  covariance  method. 

B  Minimum  Predictor  Error  l  ariance 

In  this  method  the  data  samples  are  not  considered  as  a 
part  oi  a  Gaussian  process.  In  other  words,  the  method  re¬ 
mains  the  same  as  before,  i  e . ,  the  measured  impulse  response 
is  generated  by  passing  a  noise  sequence  [et;  tin  iugji  ar.  all¬ 
pole  model  [2J  It  is  assumed  that  the  error  sequence  is  of 
zero  mean  and  of  variance  given  by 

M  M 

E[et2}  =  2  X  fifjE[xt-ixt-j)  (S3) 

1=0  1=0 

Next  the  process  is  assumed  to  be  stationary  so  that  the  ex¬ 
pectation  in  (53)  can  be  expressed  as 

£[x,_,x(_;]  =gi_J  154) 

and  the  error  variance  as 

M  M 

*[*fl]=2  'Lfiffi-t.  (5?» 

f=0  1-0 

The  problem  is  to  determine  the  filter  coefficients  so  as  to 
minimize  the  error  variance 

An  additional  assumption  is  now  made.  Specifically,  it 
is  assumed  that  the  process  is  ergodic  so  that  the  ensemble 
average  E  may  be  converted  to  a  time  average  Hence,  the 
approximation 


K  - 1  *M 

-  X 

r=  0 


2 


K  -  1 


(50) 


«i-i  = 


K-  M 


x,_,x, 


/  =  c„ 


t=M 


(56) 


It  is  interesting  to  note  that  a  has  the  identical  form  of  the 
error  defined  by  the  autocorrelation  equations  in  linear 
prediction  (see  (22)). 

It  has  been  shown  [16),  [17)  that  the  results  obtained 
for  the  unknown  filter  coefficients  /,  are  identical  to  (13) 
which  utilizes  the  autocorrelation  equations. 

The  corresponding  equations  for  the  covariance  method 
are  obtained  by  defining  a  conditional  density  function  for 
the  probability  density  This  is  achieved  by  treating  the  M 
data  points  x0,  x(,  — ,  xm  ,  as  a  set  of  deterministic  initial 
conditions  and  the  remaining  K  -  M  data  points  as  a  set  of 
random  variables  Under  the  above  assumptions,  the  con¬ 
ditional  probability  density  function  is  approximated  as 
1171.12) 

pA(xM  •  XM +  I  ■  XK  I  )  I  (*<)■  x\  •  XM  1  )} 

=  (27K>,2)~°  5(*  ~M)  exp  [-O/20,2]  (51) 


leads  to  the  covariance  equation  ( 10)  and  the  approximation 
,  K  -  If- /I-  1 

=  ~  2  *i*f+if-/l  =  r‘~ / 

n  r=o 

leads  to  the  autocorrelation  equations  (13).  Hence,  this 
method  yields  a  set  of  analysis  equations  identical  to  those 
for  the  discrete  Wiener  filter 

C.  Maximum  Entropy  Spectral  Analysis 

An  important  aspect  of  time  series  analysis  is  the  computa¬ 
tion  of  the  power  spectral  density  which  is  primarily  dc 
termined  by  the  second-order  statistics  In  an  actual  experi¬ 
ment,  the  number  of  data  points  is  always  finite  Hence,  for 
the  problem  of  interest  the  data  length  may  not  be  sufficient 
to  obtain  a  specified  degree  of  frequency  resolution.  Also, 
given  a  finite  number  of  A  data  point'  we  can  obtain  at 
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most  approximations  of  the  A  autocorrelation  functions 
r0.  r,.  ■■■,  rK  _  ,  In  the  previous  autocorrelation  formula¬ 
tions  the  data  has  been  assumed  to  be  zero  outside  the  known 
interval  In  some  instances,  this  may  be  an  unreasonable  as¬ 
sumption  about  the  extension  of  the  data  beyond  the  known 
interval  The  question  then  arises  as  to  what  assumptions 
should  be  made  about  the  data  outside  the  finite  sample  and 
what  assumption  should  be  made  about  their  second  order 
statistics  li  e.,  the  autocorrelation),  since  they  determine 
the  power  spectral  density 

Burg  proposed  an  information  theory  approach  to  the 
problem.  He  suggested  [18]  that  the  most  reasonable  choice 
of  the  unknown  autocorrelations  is  the  one  which  adds  no 
information  or  adds  most  randomness  or  maximizes  the 
entropy  He  then  proceeded  to  select  the  power  spectral 
density  having  the  maximum  entropy  of  all  possible  spectra 
that  agrees  with  the  known  values  of  the  autocorrelation 
function  r, 

The  information  content  of  a  random  process  is  defined 
in  terms  of  a  quantity  called  entropy  and  is  mathematically 
expressed  as 

H  =  -  2  pi  ln  pi  (58) 

i 

where  Pt  is  the  probability  of  the  /th  event  of  a  random  process. 
When  the  random  variable  takes  on  a  continuum  of  values, 
the  sum  in  the  definition  of  the  entropy  is  mplaced  by  an 
integral.  Since  we  are  dealing  with  a  time  senes  x0,  atj,  — , 
the  probability  is  replaced  by  the  joint  probability 
density  function  p(x0,  x,,  x*_  t ).  Thus 


where  X  is  the  column  vector  of  the  xt,  the  pnme  indicates 
the  transpose,  and  the  matrix  [/?*  ]  is  given  by 


l*tc] 


~'o  'I 

r,  r0 

9t  -l 
-  ft 


ft  -  I  ft  -v 

ft  -2  ft  -1  I 


r 


0 


(63) 


The  entropy  can  then  be  expressed  as  [  1 9 J ,  [201 

H  =  1  In  f(2rreyc  +1  det  [/?*  ]}  (64) 


Now  rK  is  to  be  chosen  in  such  a  way  that  H  in  (64)  is  maxi¬ 
mized  Hence  the  value  of  rK  is  the  one  which  maximizes 
det  If? jf  ] 

In  order  for  r,  to  constitute  a  proper  set  of  autocorrelation 
values,  the  matrix  [f?*!  must  be  positive  semi-definite  [21] 
Moreover,  det  [/?*•  ]  is  a  quadratic  function  in  rK  It  follows 
that  maximizing  det  [/?*)  with  respect  to  rK  yields  the 
value  of  rK  obtained  from  the  solution  of  the  following 
equation: 


det 


•2  ft 


rK  ft  I 


"’ft  -  2 

ft  -3 


=  0. 


(65) 


•  In  t>(x0,x,,  ■\xfc  ~\}dV  (59) 


Alternatively,  if  an  all-pole  K  -  1  order  model  is  chosen, 
then  from  the  previous  sections  we  know  that  the  autocor¬ 
relation  functions  for  this  problem  are  related  by  the  K 
unknowns  /]  as  r;  =  r_;  and 


where  dV  is  an  element  of  volume  in  the  space  spanned  by 
the  random  variables  Burg  then  proceeded  to  adjoin  a  hypo¬ 
thetical  variable  xk  to  the  available  estimates  of  the  autocor¬ 
relation  function  r0,  r,,  r2.  and  so  on.  We  may  then  consider 
the  joint  probability  density  available  for  the  K  data  points 
and  the  adjoined  X*  as 

P(*0.*l.  "  ■%  -1<*K  )  (60) 

This  probability  density  function  has  an  entropy 


•  In  {p(x0.  x,,  xK  _  i,xK  )}  dV  (61) 

Burg  chose  as  (60)  that  probability  density  function  which 
has  its  first  K  second-order  moments  as  r0,  rt,  •••,  ?k-\, 
and  which  under  the  given  constraint  maximized  (61).  The 
obvious  choice  for  the  probability  density  function  in  (60) 
is  Gaussian  since  according  to  Shannon  and  Weaver  119], 
[20 1  the  Gaussian  distribution  results  in  maximum  entropy 
under  a  constant  energy  constraint.  Thus 

pi *0*1.**  -i.  **) 

K  +  1 

(2»r)  —  det  [/?*  ] 


K  -  1 

fj  +  2  fkri-k  =  0.  for)  =  1,  2,  A’.  (66) 

k=  1 

This  is  identical  to  (23),  for  /0  =  1  The  set  of  K  equations 
in  K  -  1  unknowns  in  (66)  indeed  has  a  solution  which  is 
found  by  solving  the  first  K  -  1  equations.  The  last  equa¬ 
tion  can  be  seen  to  be  a  linear  combination  of  the  first  K  -  1 
equations  In  fact,  the  determinant  of  the  above  set  of  equa¬ 
tions  in  (66)  is  identical  to  that  of  (65).  In  this  sense.  (66) 
is  consistent  with  the  maximum  entropy  method. 

Thus  it  is  shown  that  the  extrapolated  autocorrelation 
functions  coincide  with  those  functions  which  would  have 
been  predicted  by  the  mode’  of  equation  (66)  Hence  this 
procedure  is  equivalent  to  the  all-pole  model  described  by 
the  maximum  likelihood  estimation  ( 20 ] -[  23 ] 

Since  so  far  as  the  second-order  statistics  are  concerned, 
the  sampled  data  x,  may  be  modeled  arbitrarily  closely  by 
an  all-pole  model  of  order  A',  we  may  view  the  above  process 
as  an  autoregressive  process  with  input  (white  noise)  and 
output  (x,)  where  the  filter  is  described  by  the  transfer  func¬ 
tion  [22], [23] 


«(*)-— - 

i  +  2 


(67) 
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*<>(/)  =  W)  I  *(/'<*>)  I*  (68) 

where  S0(f)  and  S,(  f )  are  the  output  and  input  power  spectral 
density  respectively  The  power  spectral  density  of  the  proc¬ 
ess  x ,  has  been  shown  to  be  1 20]  -[  23] 

Siif) 

S0if)= - J - | - ^  (69) 

1  +  2  f<  CXP  ( — /-Tr/i^r] 

/=  i 

i 

where  S,(/)  is  the  power  spectrum  of  the  white  noise  driving 
the  filter. 

Thus,  for  M  =  K.  identical  analysis  equations  are  obtained 
by  the  maximum  entropy  spectral  analysis  and  by  the  maxi¬ 
mum  likelihood  estimation  theory  It  has  also  been  shown  else¬ 
where  that  this  is  indeed  so  [20)— l  —  3). 

IV  PROXY'S  ALGORITHM  AND  ITS  EXTENSIONS 

In  the  previous  two  sections  the  system  identification 
problem  was  treated  as  a  stochastic  problem  The  measured 
impulse  response  was  characterized  by  a  random  process.  In 
this  section  a  different  approach  is  taken  in  that  the  noise 
contaminated  impulse  response  is  processed  as  though  it 
were  a  deterministic  process  The  problem  is  to  determine 
the  poles  and  residues  which  characterize  the  measured  im¬ 
pulse  response 

Historically.  Prony  was  the  first  to  make  an  attempt  at 
fitting  experimental  data  with  complex  exponentials  In 
1795  Prony  postulated  that  the  basic  laws  dealing  with  gas 
expansion  can  be  expressed  as  a  sum  of  exponentials  He 
demonstrated  that,  given  2M  data  points,  it  is  possible  to 
fit  exactly  M  exponentials  to  the  data  at  those  points  PTony 
must  have  experienced  great  frustration  when  he  applied 
his  method  due  to  the  extreme  sensitivity  of  the  exponent 
to  the  accuracy  of  the  measured  data.  Accuracy  requirements 
have  been  investigated  by  Lanczos  [24],  However,  in  some 
cases  of  EMP  problems,  the  exponentials  encountered  are 
complex  and  they  are  approximately  at  harmonic  frequencies 
Moreover,  the  real  parts  of  the  complex  exponentials  are 
much  smaller  than  the  imaginary  parts.  Hence,  the  damped 
exponentials  in  the  case  of  electromagnetic  pulse  problems 
are  more  closely  orthogonal  and,  thereby,  create  fewer  prob¬ 
lems  with  regard  to  accuracy  than  is  evidenced  in  the  example 
by  Lanczos  The  contamination  of  data  by  noise  creates 
grave  problems  for  extracting  M  correct  exponents  from  2M 
data  points.  Hence,  more  data  are  necessary  and  a  semi-least 
squares  approach  to  Prony's  method  is  taken.  The  details 
of  Prony's  method  are  well  known  ( 25 J  — ( 3 1 )  and  are  omitted 
in  this  paper  The  final  equations  that  ultimately  result  are 
the  same  as  either  the  autocorrelation  or  covariance  equations 
of  linear  prediction  McDonough  [13]  and  Van  Blaricum 
[251,  in  their  Ph  D  dissertations,  and  Markel  and  Gray  [21 
used  the  covariance  equations  (10).  Thus,  the  semi-least 
squares  Prony’s  method  is  equivalent  to  a  /W-length  Wiener 
prediction  filter  The  term  semi-least  squares  has  been  ap¬ 
plied  because  the  true  least  squares  problem  would  give  rise 
to  a  set  of  coupled  nonlinear  equations  (261.  The  true  least 
squares  problem  has  been  defined  as  in  [281. 

Prony's  method  has  found  wide  applications  in  finding 
poles  and  residues  of  transient  waveforms  through  the  works 
of  Miller,  Brittingham,  and  Willows  [27J . 
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V  CONCLUSION 

We  have  demonstrated  that  identical  analysis  equations 
can  be  obtained  by  several  different  techniques  even  though 
they  start  with  formulations  based  on  different  assumpt;  '.ns 

However  a  major  objection  with  all  these  techniques  is 
that  there  seems  to  be  no  compelling  reason  for  the  shout-  o! 
fl0  -  1  rather  than  ]  =  1 ,  or  for  that  matter  the  choice 
of  any  a,  to  be  unity  Yet  each  of  these  choices  leads  to  a 
different  set  of  exponents  (13)  As  an  example  consider  the 
signal  exp  l  —  0.0003 5r )  cos  (0.25r)  For  this  signal  we  have 
s,  2  =  exp  1-0  00035  ±  /0.25J  =  0.97  i  /0.25  Let  the 
length  of  data  points  be  A  =  5  and  the  order  of  the  filter 
M  —  2  Then  fjr,}  is  the  sequence  •  1  00.  0  9*.  0  88.  0  "3.  0  54 
for  the  sequence  \t  =  0,  I,  2,  3,  4  ■  To  this  we  add  1  percent 
zero-mean  additive  noise  to  obtain  the  sequence  •x;nu“c-  = 
{ 1  01 .  0  96,  0.89.  0  72.  0  54  •  Using  the  covanan,e  method, 
j,  2  for  this  signal  are  obtained  from  the  two  roots  of  the 
polynomial  equation  j0j‘  +  u,r  +  a2  -  0  If  one  assumes 
a0  =  1,  then  the  computed  2  -  0  8~  i  ;. 23.  If  one  assumes 
a,  =  1 .  then  the  computed  s  t  j  =  0.98  i  j,22  If  one  assumes 
a 2  =  1 .  then  the  computed  jj  2  -  1  1  —  / .  1 4 . 

The  above  example  thus  illustrates  the  dependence  of  the 
exponents  on  the  choice  of  a,  to  be  unity 
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Alternatively ,  if  an  all  pole  K— 1  order  model  is  chosen,  then 
from  the  previous  sections  we  know  that  the  autocorrelation  functions 
tor  this  problem  are  related  by  the  K  unknowns  f.  as  r."  r  and 
K-1 

+  I  Vi-k  "  0,  for  j  -  1,2,  ....  K.  (3.221 

J  k-1  *  2  k 


This  is  identical  to  (2.23),  for  f^  *  1.  The  set  of  K  equations  in 
K-1  unknowns  in  (3.22)  indeed  has  a  solution  which  is  found  by  solving 
the  first  K-1  equations.  The  last  equation  can  be  seen  to  be  a 
linear  combination  of  the  first K-1  equations.  In  fact,  the  determinant 
of  the  above  set  of  equations  in  (3.22)  is  identical  to  that  of  (3.21). 
In  this  sense,  (3.22)  is  consistent  with  the  maximum  entropy  method. 

Thus  it  is  shown  that  the  extrapolated  autocorrelation  functions 
coincide  with  those  functions  which  would  have  been  predicted  by  the 
model  of  equation  (3.22).  Hence  this  procedure  is  equivalent  to  the 
all-pole  model  described  by  the  maximum  likelihood  estimation  [20-23]. 

Since  so  far  as  the  second  order  statistics  are  concerned,  the 
sampled  data  x^  may  be  modeled  arbitrarily  closely  by  an  all-pole 
model  of  order  K,  we  may  view  the  above  process  as  an  autoregressive 
process  with  input  (white  noise)  and  output  (xt)  where  the  filter  is 
described  by  the  transfer  function  [22,  23] 


1 

K-1 

1  +  l  v 

i-1  1 


C3.23) 


22  D.E.  Smylie,  G.K.C.  Clarke  and  T.J.  Ulrych,  "Analysis  of  Irregu¬ 
larities  in  the  Earth's  Rotation,"  Methods  in  Computational  Physics, 
Vol.  13,  New  York:  Academic  Press,  1973,  pp.  391-430. 

23  S.L.  Marple,  "Conventional  Fourier,  Autoregressive  and  Spectral 
Methods  of  Spectral  Analysis,"  Ph.D.  Dissertation,  Stanford  University, 
Palo  Alto,  California,  1976. 


34 


Also 


S0(f)  -  S.  (f)  |H(jui)  j  2  (3.24) 

where  SQ(f)  and  S^Cf)  are  the  output  and  input  power  spectral  density, 
respectively.  The  power  spectral  density  of  the  process  has  been 
shown  to  be  [20-23] 


s0(f) 


S.(f) 


1  +  J  f  exp[-j  2irfiAt  ] 

i-1 


(3.25) 


where  S^(f)  is  the  power  spectrum  of  the  white  noise  driving  the 
filter. 

Thus,  for  M  ■  K,  identical  analysis  equations  are  obtained  by 
the  maximum  entropy  spectral  analysis  and  by  the  maximum  likelihood 
estimation  theory.  It  has  also  been  shown  elsewhere  that  this  is 
Indeed  so  [20—23]. 
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IV.  PHONY'S  ALGORITHM  AND  ITS  EXTENSIONS 


In  tbe  previous  two  sections  the  system  identification  problem 
was  treated  as  a  stochastic  problem.  The  measured  impulse  response 
was  characterized  by  a  random  process.  In  this  section  a  different 
approach  is  taken  in  that  the  noise  contaminated  impulse  response 
is  processed  as  though  it  was  a  deterministic  process.  The  problem 
is  to  determine  the  poles  and  residues  which  characterize  the  measured 
impulse  response. 

Historically,  Prony  was  the  first  to  make  an  attempt  at  fitting 
experimental  data  with  complex  exponentials .  In  1795  Prony  postu¬ 
lated  that  the  basic  laws  dealing  with  gas  expansion  can  be  ex¬ 
pressed  as  a  sum  of  exponentials.  He  demonstrated  that,  given  2M 
data  points,  it  is  possible  to  fit  exactly  M  exponentials  to  the 
data  at  those  points.  Prony  must  have  experienced  great  frustratior 
when  he  applied  his  method  due  to  the  extreme  sensitivity  of  the 
exponent  to  the  accuracy  of  the  measured  data.  Accuracy  requirements 
have  been  investigated  by  Lanczos  [24],  However,  in  seme  cases  of 
EMP  problems  the  exponentials  encountered  are  complex  and  they  are 
approximately  at  harmonic  frequencies.  Moreover,  the  real  parts  of 
the  complex  exponentials  are  much  smaller  than  the  imaginary  parts. 

Hence,  the  damped  exponentials  in  the  case  of  EMP  problems  are  more  close 
orthogonal  and,  thereby,  create  fewer  problems  with  regard  to  accuracy 

24  C.  Lanczos,  Functional  Analysis.  Englewood  Cliffs,  N.J.: 

Prentice  Hall,  1956,  pp.  272-279. 


36 


chan  is  evidenced  in  che  example  by  Lanczos.  The  contamination  of 
data  by  noise  creates  grave  problems  for  extracting  M  correct  exponents 
from  2M  data  points.  Hence,  more  date  are  necessary  and  a  semi-least 
squares  approach  to  Prony's  method  is  taken.  The  details  of  Prony's 
method  are  well  known  and  are  omitted  in  this  report.  The  final 
equations  that  ultimately  result  are  the  same  as  either  the  auto¬ 
correlation  or  covariance  equations  of  linear  prediction.  McDonough 
[13]  and  Van  Blaricum  [25],  in  their  Ph.D.  dissertations,  and  Markel 
and  Gray  [2]  used  the  covariance  equations  (2.10).  Thus,  the  semi¬ 
lease  squares  Prony's  method  is  equivalent  to  a  M- length  Wiener  pre¬ 
diction  filter.  The  term  semi-least  squares  has  been  applied  because 
the  true  least  squares  problem  would  give  rise  to  a  set  of  coupled 
nonlinear  equations  [26].  The  true  least  squares  problem  has  been 
defined  as  in  [26], 


4.1  Various  Extensions  to  Prony’s  Method 

The  reason  for  prt  .'enting  this  section  is  to  show  that  for  a 
particular  extension  to  Prony's  method  a  procedure  similar  to  the 
pencil-of-functions  method  is  posed  in  a  Hilbert  space.  Yet  each  of 
them  yields  a  different  answer.  Hence,  the  way  in  which  a  problem 
is  developed  is  extremely  important. 


25  M.  Van  Blaricum  and  R.  Mittra, "Techniques  for  Extracting  the 
Complex  Resonances  of  a  System  directly  from  its  Transient 
Response;  Interaction  Note  301,  December  1975.  (Also  in  IEEE 
Trans,  on  Antennas  and  Propagation,  Vol.  AP-23,  No.  6,  Nov.  1975.) 

26  R. N.  McDonough  and  W.H.  Huggins , "Best  Least-Squares  Representation 
of  Signals  by  Exponentials,"  IEEE  Trans.  AC-13,  No.  4,  pp.  405- 
412,  August  1968. 
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27  D.F.  Tuctle,  "Network  Synthesis  for  Prescribed  Transient  Response," 

D.Sc.  Dissertation,  1948. 

28  W.E.  Kautt,  "Approximation  over  a  Semi-infinite  Interval,"  M.S. 
Thesis,  M.I.T. ,  1948. 


38 


The  expansion  of  x^  as  a  linear  combination  of  its  derivatives 
is  inappropriate  if  the  data  x^  are  not  everywhere  smooth  (for  example 
when  it  is  sampled).  Carr  [29]  extended  this  technique  by  integration 
of  the  differential  equation  (4.1)  k  times.  This  leads  to 


M-l 


l 

1-0  1  6 


(i-k) 


0  with  a  -  1 
o 


(4.5) 


where 


.(i-k) 


r  /^k“l  C  ^  (i) 

I  §  . J  xt±  dtl  dt2  dCk-l 


dtC4.6) 


-H*>  +00 


[Note:  The  lower  limit  of  the  integral  is  +®  and  it  is  assumed  hat 


(+»)  -  0  for  i  -  0,1,  ....  M-l.]  (4.7) 

Again,  if  the  data  are  noisy,  the  coefficients  (a^  are  obtained  from 
the  minimization  of  the  function 

[  l  a  xk“k)]2dt  (4.8) 

1  ^ 


The  exponents  s^  are  obtained  as  before  from  the  solution  of  the 
polynomial  equation  (4.4). 

It  is  interesting  to  observe  that  this  approach  is  very  similar 
to  the  pencil-of-function  method  as  discussed  in  section  VX.  For 
k  -  M-l,  it  is  obvious  that  the  data  x^  is  an  element  of  a  Hilbert 
space  spanned  by  the  data  and  its  successive  integrals  [30]. 


29  J.W.  Carr,  "An  Analytic  Investigation  of  Transient  Synthesis  by 
Exponentials, "  M. S,  thesis,  M.I.T.,  1949. 

30  M.J.  Narasimha  et  al,  "A  Hilbert  Space  Approach  to  Linear  Predic¬ 
tive  Analysis  of  Speech  Signals,  Tech.  Report  3606-10,  Radioscience 
Lab,  Stanford  Electronics  Lab,  Stanford  University,  California,  1974 
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The  solution  to  the  system  identification  problem  by  any  difference 


equation  leads  to  a  regularized  ill-posed  problem  which  has  been 

regularized  in  terms  or  how  close  the  actual  and  the  measured  responses 

are  instead  of  the  natural  frequencies  of  the  waveform.  This  is  described 
in  detail  in  the  next  section. 

A  major  objection  with  all  these  techniques  is  that  there  seems 

to  be  no  compelling  reason  for  the  choice  of  a  ■  1  rather  than  a^^  “  -» 

or  for  that  matter  the  choice  of  any  a ^  to  be  unity.  Yet  each  of 

these  choices  leads  to  a  different  set  of  exponents  [13].  As  an 

example  consider  the  signal  exp  {-  00035t]  cos  (0.25t).  For  this 

signal  we  have  2  “  exp  [-.00035  +  j0.25]  «  0.97  +  jO.25.  Let 

the  length  of  data  points  be  K  **  5  and  the  order  of  the  filter  M  ■  2. 

Then  {x^}  is  the  sequence  (1.00,  0.97,  0.88,  0.73,  0.54}  for  the  sequence 

{t«0,  1,  2,  3,  4}.  To  this  we  add  15  zero  mean  additive  noise  to 

obtain  the  sequence  {x^°^Se}  «■  {1.01,  0.96,  0.89,  0.72,  0.54}.  The 

s..  .  for  this  signal  are  obtained  from  the  two  roots  of  the  polynomial 
1  >  *• 

2 

equation  a^s  +  a^  s  +  a^  ■  0.  If  one  assumes  a^  ■  1,  then  the 
computed  ^  *  0.87  +  j.23.  If  one  assumes  a^  *  1,  then  the  computed 

s^  ^  "  0.98  +  j . 22.  If  one  assumes  a^  *  1,  then  the  computed 
s1  2  m  1.1  ±  j  *14. 

The  above  example  thus  illustrates  the  dependence  of  the  exponents 
on  the  choice  of  a^  to  be  unity. 
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V.  ILL-POSED  AND  WELL-POSED  PROBLEMS  OF 


SYSTEM  IDENTIFICATION 


The  system  identification  problem  is  almost  always  ill-posed.  (This 
is  reflected  by  the  fact  that  two  impulse  responses  with  drastically 
different  natural  frequencies  may  yield  almost  identical  outputs  for  the 
same  input.)  An  ill-posed  problem  can  be  regularized  however  by  imposing 
additional  constraints  on  the  system. 

We  begin  our  discussion  by  defining  the  concept  of  a  well-posed 
problem  along  the  lines  of  Tykhonov  [31-32]  and  Lavrentiev  [33].  In 
particular,  consider  the  operator  equation 

Xh  -  y  (5.1) 

where  the  operator  X  maps  an  element  in  the  space  H  to  an  element  in  the 
space  Y.  The  problem  of  solving  (5.1)  for  h  given  X  and  y  is  said  to  be 
well-posed  if  the  following  conditions  are  satisfied: 

1)  The  solution  to  (5.1)  exists  for  each  element  in  the  space  Y. 

2)  The  solution  to  (5.1)  is  unique  in  E. 

3)  Small  perturbations  in  y  result  in  small  perturbations  in  the  solution 
to  (5.1)  without  the  need  to  impose  additional  constraints. 

If  any  of  these  conditions  is  violated,  the  problem  is  said  to  be 

31  A.N.  Tykhonov,  "On  the  Solution  of  Incorrectly  Formulated  Problems 

and  the  Regularization  Methods,"  Soviet  Mathematics,  4,  1963  pp  .1035-10 

32  A.N.  Tykhonov,  "Regularization  of  Incorrectly  Posed  Problems," 

Soviet  Mathematics,  4,  1963,  pp.  1624-1627. 

33  M.M.  Lavrentiev, "Some  Improperly  Posed  Problems  of  Mathematical 
Physics,"  Springer-Verlag  Tracts  in  Natural  Philosophy,  Vol.  II, 
Springer-Verlag,  Berlin,  1967. 


41 


ill-posed.  It  is  important  to  realize  that  uncertainty  in  data  due  to 


measurement  error  may  cause  a  problem  to  become  ill-posed.  Specifically, 
this  results  when  a  noisy  measurement  of  y  produces  a  waveform  which  does 
not  belong  to  the  space  Y. 

When  (5.1)  was  introduced,  it  was  assumed  that  the  operator  X  was 
known  exactly.  When  there  is  uncertainty  in  X,  in  addition  to  uncertainty 
in  y,  the  problem  is  said  to  be  well-posed  in  the  wide-sense  provided 
condition  (3)  is  generalized  to  require  that  the  solution  h  depends 
continuously  on  both  X  and  y  (i.e.  small  perturbations  in  both  X  and  y 
should  produce  only  small  perturbations  in  h) .  For  example,  in  linear 
least-squares  problems  where  (5.1)  is  a  matrix  equation  in  a  finite 
dimensional  space,  the  solution  is  given  by 

h  -  (XT  XT1  XT  y.  (5.2) 

Since  the  generalized  inverse  of  a  matrix  does  not  depend  continuously 
on  its  matrix  elements,  the  problem  is  ill -posed  in  the  wide  sense. 
Interestingly  enough,  this  problem  is  well-posed  in  the 
narrow  sense.  If  the  determinant  of  the  matrix  is  very  small  or 
the  condition  number  (||X  j|  |!x"1|j  )  is  very  large,  then  the  problem 
is  numerically  ill-conditioned  [34].  Another  example  of  an  ill- 
posed  problem  is  the  integral  equation  of  the  first  kind. 

Hadamard  introduced  the  notion  of  a  well -posed  (correct,  properly 
posed)  problem  at  the  beginning  of  this  century  when  he  studied  the 
Cauchy  problem  in  connection  with  the  solution  of  Laplace's  equation 
[32].  He  observed  that  the  solution  did  not  depend  continuously  on  the 
data.  On  the  basis  of  this,  Hadamard  concluded  that  something  was 

34  M.Z.  Nashed,  "Some  Aspects  of  Regulariaztion  and  Approximation 
Solutions  of  Ill-Posed  Operator  Equations,"  Proceedings  of  the 
1972  Armv  Numerical  Analysis  Conference,  pp.  163-181. 
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wrong  with  the  problem  formulation  because  solutions  exhibiting  such 
type  of  discontinuous  dependence  do  not  correspond  to  physical  systems, 
i.e.  they  do  not  arise  in  the  study  of  natural  phenomena.  Other  mathe¬ 
maticians  of  that  time,  such  as  Petrovsky,  also  reached  the  same 
conclusion. 

Mathematicians,  such  as  Sadamard  and  Petrovsky,  reasoned  that  the 
mathematical  models  associated  with  the  ill-posed  problems  must  be 
incorrect.  However,  today  it  is  recognized  chat  their  definition  of 
a  well-posed  problem  is  lacking.  In  fact,  using  that  definition,  many 
"inverse"  problems  of  mathematical  physics  are  ill-posed.  This  includes 
most  radiation  and  scattering  problems  in  antenna  theory. 

In  order  to  avoid  difficulties  associated  with  the  original  definition, 
Tykhonov  suggested  that  the  three  conditions  be  restated  differently.  In 
addition  to  the  metric  spaces  H  and  1  and  the  operator  X,  let  there  be 
given  some  closed  set  C.l.  Me  call  the  problem  for  the  solution 

of  (5.1)  properly  posed  according  to  Tykhonov  if  the  following  conditions 
are  fulfilled: 

1)  It  is  required  that  the  solution  h  exists  for  some  class  of  data 
y  and  belongs  to  the  given  set  Hc>  h  t  H^. 

2)  The  solution  is  unique  for  the  class  of  solutions  belonging  to  R  . 

3)  Arbitrarily  small  changes  of  y  which  do  not  carry  the  solution  outside 
the  metric  space  Hc  correspond  to  arbitrarily  small  changes  in  the 
solution  h. 

Me  denote  by  the  image  of  after  application  to  the  space  H  of 
the  operator  X.  Requirement  3)  can  now  be  restated  in  the  following 
manner , 


A3 


[ 


3)  The  solution  of  equation  (5.1)  depends  continuously  on  the  right-hand 

side  v  which  is  a  member  of  the  set  H"  . 

c 

If  is  a  comoact  set,  the  following  statement  holds.  If  eauaticr. 
c 

(5.1)  satisfies  the  requirements  1),  2)  of  a  well-posed  problem  due  to 
Tykhonov,  then  there  exists  a  function  n(t)  such  that  [32] 

a)  a  (t)  is  a  continuous  nondecreasing  function  with  a(0)  »  0. 

b)  For  any  h^,  h„  t  satisfying  the  inequality  d(Xh^,  Xh^)  £ 
then  d(h^,  h.,)  £a  (£) 

Thus  the  requirement  of  continuous  dependence  is  satisfied  if  1)  and  2) 
are  satisfied. 

We  note  that,  if  a  problem  is  properly  posed  according  to 

Tykhonov  and  we  replace  the  metric  spaces  H  and  T  by  their  subspaces 

H  and  a  ,  then  the  oroblem  becomes  properly  nosed  in  the  usual  sense, 
c  c  ... 

The  necessity  of  examining  spaces  E,  Y  together  with  H  ,  is 

due  to  the  fact  that  in  real  problems  the  errors  committed  in  the 

determination  of  the  right-hand  side  y,  usually  lead  to  y  outside  of 

H^.  The  consideration  of  the  problem  according  to  Tykhonov' s  formulation 

gives  the  possibility  of  constructing  an  approximate  solution  with  a 

certain  guaranteed  degree  of  accuracy  in  spite  of  the  fact  that  an 

exact  solution  of  (5.1)  with  approximate  data  either  does  not  exist 

at  all  or  may  strongly  deviate  from  the  "true"  solution. 

The  new  set  of  three  conditions  may  be  summarized  as  follows. 

The  first  condition  guarantees  the  existense  of  X  ^  in  the  sense  that 

a  solution  may  proceed  by  choosing  a  complete  basis  from  the  compact 

set  H  in  order  to  project  v  and  Xh  (for  b  t  H  )  into  .  The 
c  c  c 

uniqueness  of  the  solution  is  guaranteed  by  condition  two.  Condition 

three  requires  the  continuity  of  the  solution  in  the  space  H  . 

c 
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■"we: j  ar.  ill-posed  problem,  "yknonov  has  regularized 
the  problem,  by  redefining  what  is  meant  by  a a  acceptable  solution. 
Basica.'ly,  the  idea  is  to  make  constructive  use  of  the  notions 
we  have  with  regard  to  a  physical  problem  by  which  we  determine 
a  certain  class  of  acceptable  answers  having  tiore-or-less 
acceptable  magnitudes  and  degrees  of  smoothness.  Regularization  of  an 
ill-posed  problem  need  not  be  confined  to  the  method  of  Tykhonov. 

Various  schemes  proposed  in  the  literature  have  involved  one  or  more 
of  the  following  concepts:  [34] 

a)  a  change  in  the  definition  of  a  solution 

b)  a  change  in  the  space  to  which  the  solution  belongs 

c)  a  change  of  the  operator  X 

d)  the  introduction  of  regularizing  operators 

e)  probabilistic  methods  or  well-posed  stochastic  extensions 
of  ill-posed  problems. 

Note  that  it  may  be  nossible  to  regularize  an  ill-posed  problem 
with  respect  to  one  set  of  variables  but  not  another.  Thus,  the  choice 
of  piecewise  triangles  or  piecewise  sine  functions  as  a  basis  for 
expanding  the  current  distribution  on  an  antenna  by  the  method  of 
moments  results  in  a  regularized  ill -posed  problem  with  respect  to  the 
current  distribution  on  the  antenna  structure.  However,  the  problem 
is  not  regularized  with  respect  to  change  because  the  change  distribution 
obtained  in  this  manner  is  discontinuous.  As  a  point  of  interest,  the 
method  of  moments  regularizes  an  ill-posed  scattering  or  a  radiation 
problem  by  the  introduction  of  concepts  (b)  and  (c) . 

In  a  system  identification  problem,  the  objective  is  to  find  the 
impulse  response  h(t)  of  a  linear  time- invariant  system  when  the  input 


X 


The  input-output 


(t 1  and  tne  output  da' a  y(c)  to  a  system  are  known, 
relationship  for  a  causal  system  is  described  by 

t 

r 

v(t )  ”  x(t-r ''h (." )d*  ~  Xh 
'  0 

In  ptoriems  arising  in  system  identic ication  we  are  usually  certain 
of  the  existence  of  the  function  h(t)  that  appears  in  the  integrand  in 
equation  (5.3).  Its  uniqueness  can  also  be  guaranteed.  However  even  if  the 
solution  exists  and  is  unique,  eq.  (5.3)  can  have  for  a  specific  X  a 
peculiarity  which  makes  the  problem  an  incorrectly  posed  one.  This 
peculiarity  arises  from  the  "smoothing"  action  of  the  convolution  operator  X. 
This  is  illustrated  with  the  following  example. 

Consider  two  continuous  functions  h^(t)  ■  h(t)  and  h?(t)  "  h7 (t)  +  C  sin  u> 

It  is  clear  that  even  for  a  very  large  value  of  C,  we  can  choose 

sufficiently  a  large  value  of  w  such  that  the  difference  between 
y  -  Xh  and  y,,  »  Xh?  is  less  in  absolute  value  than  any  previously  given 

(arbitrarily  small)  number  t,  i.e  the  operator  X  "smoothes"  out  a  very 
intense,  but  adequately  high-frequency  component,  to  an  extremely  small 
level.  The  presence  of  disturbances  accompanying  the  function  y(t) 
makes  the  problem  ill-posed.  For  instance,  assume  that  experimental 
conditions  permit  agreement  of  the  measurement  d(t)  with  the  measured 

function  y(t)  only  to  within  an  error  o 
max 

0<t<°= 

Xt  is  easy  to  see  that  if  the  operator  X  of  (5.3)  has  the  smoothing  action 
we  have  described,  then  we  can  always  find  two  junctions  h^(t)  and  h?(t) 
whose  transforms  »  Xh,  and  y^  ■  Xh?  both  satisfy  (5.4).  Accordingly, 
there  are  at  least  two  different  functions  that  satisfy  (5.3)  with  d(t) 
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aeasurec  bv  experiment  to  wicnin  ar.  error  :.  In  fact,  there  exists  an 
-n-~nice  set  o:  suon  : unctions ,  whose  members  may  differ  from  each  other 
ov  as  mucr.  as  we  o.ease.  It  is  in  this  situation  that  the  "incorrectness" 
or  tne  oroolen  rorau.ation  i.l  actually  lies. 

•n  an.enaa  oro:. -eas ,  tne  situation  is  not  that  severe  due  to  the 
nigr._v  p*a».«c  .cap-  tne  tercel  exp  (~1kr)/r. 

5upooae  aov  -;e  measured  function  is  d(c)  -  y(t)  +  e(t)  where  e(t)  is 
the  none  or  the  system.  The  noise  is  assumed  to  be  a  stationary 
random  process  with  cere  mean  and  correlation  function  4>(T). 

The  solution  or  the  oroblem 

mo 

r  *ct  •  *o  t  c/o  dL  r 


■W*  J 


(5.5) 


is  formally  obtained  in  terms  of  Fourier  transforms  of  the  output  and 

input  by  means  of  the  expression 

***  +■ 


) 


\ 


AM 


1 


(5.6) 


^0-0  A.U3 
-oc  i  O'*. ) 

where  the  symbol  ~  denotes  the  Fourier  transform  of  the  corresponding 

function.  Of  interest  is  Che  variance  of  the  function  h(t)  when  instead 

of  y  we  use  d  *  y  +  e  in  (5.5).  The  variance  is  derived  in  [353  as 

c  w  *  i-w  «.-> 

where  £>  (w)  is  the  power  spectrum  of  the  noise.  Note  for  finite  energy 
signals  that 

|  X  l  — *•  °  fa*  1°^  - 5r“°-  (5.g) 


In  order  that  the  variance  of  the  solution  remains  finite,  the  power 
spectrum  fit'*)  of  the  noise  must  fall  off  sufficiently  rapidly  as 


35  V.F.  Turchin  et  al,  "The  Use  of  Mathematical  Statistics  Methods 
in  the  Solution  of  Incorrectly  Posed  Problems,"  Soviet  Physics 
Uspekli  19,  1971,  pp.  681-703. 
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This  imposes  severe  restrictions  on  the  class  of  processes  e(t)  that  are 
admissible  as  noise.  Commonly,  these  conditions  are  not  satisfied;  since 
the  noise  is  usually  assumed  to  contain  a  background  "white  noise"  component. 
Consequently  as  |*5^-»**  the  spectrum  £(.&)  approaches  a  nonzero  constant 
limit.  Then  the  variance  in  (5.7)  is  infinite.  Hence i unsatisfactory 
solutions  are  obtained  when  the  experimentally  found  function  d(t)  is 
substituted  for  y(t)  in  (5.5).  The  source  of  the  difficulty  is  obvious: 
the  high  frequency  components  of  d(t),  which  arise  from  the  presence 
of  noise  and  which  are  not  present  in  the  true  function  y(t) ,  produce 
large  oscillations  in  the  solution. 

It  is  useful  to  examine  the  situation  needed  for  (5.5)  to  be  a 
correctly  posed  problem  in  the  presence  of  white  noise.  In  particular, 
if  xCCfci)  is  a  rational  fraction,  the  numerator  polynomial  must  be  of 
higher  degree  than  the  denominator  polynomial.  This  requires  in  x(t) 
the  presence  of  singularity  fractions  such  as  doublets,  triplets,  etc., 
all  of  which  have  infinite  energy. 

The  classical  Wiener  problem  is  also  ill-posed.  The  solution  is 
determined  from  the  orthogonality  principle,  which  states  that  the  linear 
mtnimnri  mean  square  error  estimator  is  chosen  to  make  the  error  orthogonal 
to  the  data.  However,  the  solution  is  not  unique  because,  in  general, 
other  estimators  which  are  also  orthogonal  to  the  data  can  be  added  to 
the  solution  without  upsetting  the  orthogonality  condition. 

Maximum  likelihood,  minimum  variance,  and  maximum  entropy  spectral 
estimation  regularize  what  would  otherwise  be  an  ill-posed  problem  by  using 
statistical  techniques  to  estimate  the  solution  as  opposed  to  solving  for 
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an  exact  solution.  The  main  feature  of  a  statistical  regularization  scheme 
is  that  an  estimation  "rule"  is  prescribed  for  the  observed  data.  Given  the 
noisy  data,  one  simply  applies  the  estimation  "rule"  in  order  to  achieve 
the  estimate.  The  quality  of  the  estimate  depends  on  the  goodness  of  the 
estimation  rule  chosen  as  well  as  the  accuracy  of  the  a  priori  knowledge 
concerning  the  statistics  of  the  underlying  process.  This  leads  to  the 
replacement  of  the  exact  solution  of  the  equation  by  an  approximate 
"regularized"  solution.  Different  strategies,  both  optimal  and  suboptimal, 
may  be  suitable  for  different  problems.  However,  they  all  result  in  a 
statistical  regularization  of  the  problem  and,  in  general,  yield  estimates 
of  varying  quality. 

One  disadvantage  of  the  statistical  regularization  approach  is  that 
considerable  a  priori  Information  is  usually  needed  if  a  particular 
strategy  is  to  be  successfully  applied.  Nevertheless,  the  following 
advantages  hold: 

First,  the  probabilistic  approach  is  the  natural  way  to  describe 
measurement  noise  which  is  often  responsible  for  a  problem  becoming  ill- 
posed. 

Second,  the  probabilistic  method  allows  more  complete  use  of  previous 
experience,  by  including  it  in  the  a  priori  distributions. 

Third,  when  there  is  no  such  experience,  the  probabilistic  method 
still  allows  one  to  proceed  by  making  use  of  extremely  weak  assumptions 
about  the  unknown  processes. 

The  various  techniques  discussed  in  section  III  describe  various 
strategies  to  regularize  in  a  statistical  way  the  ill-posed  system 
identification  problem. 
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The  pencil-of- functions  regularizes  the  identification  problem  by 
generating  the  compact  set  Hc  to  which  the  solution  belongs.  This  is  achieved 
by  introduction  of  Che  operator  S  which  integrates  a  function  from  —  to  t. 

For  a  discrete-time  system  the  integral  reduces  to  a  sum.  The  pencil-of- 
function  method  makes  use  of  the  simple  sequence 

""I  "  {  Ai  ri  ] 

The  sequence  is  indexed  on  k  and  rJ.  -  exp  (^2  7^ f ±)  and  {A^  are  the  residues. 
Application  of  the  operator  S  on  ^  reduces  to 

STl  *  {  Ai  ri  /(  1  “  ri  )}‘ 

It  follows  that  [l  -  (  1  -  rt  )s]^  -  {Oj  .  It  can  also  be  shown  that 

the  operator  S  maps  the  space  onto  itself  while  preserving  the  poles  of  the 

sequence.  It  is  because  of  this  factor  that  the  poles  and  zeros  obtained  by 

this  method  are  extremely  stable,  reliable,  consistent  and  accurate. 

The  vectors  which  span  the  compact  set  to  which  the  pole  r^ 

belongs  are  generated  by  successive  applications  of  the  operator 

£_1  -  (  1  -  ri  )S  3  to^.  Note  that  in  this  operator  r^  is  an  unknown 

quantity.  The  r^'s  are  obtained  from  the  linear  dependence  of  the  spanning 

vectors  of  the  compact  set  R  .  The  detailed  mathematical  derivations  may  be 

c 

obtained  in  [36}. 
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VI.  PENCIL-OF-FUNCTIONS  METHOD 


A  useful  mathematical  entity  arises  by  combining  two  given  func¬ 
tions  defined  on  a  common  interval  together  with  a  scalar  parameter 
as 


f(t.X)  -  Xg(t)  +  h(t) 


(6.1) 


The  entity  f  is  called  a  pencil  of  functions  where  g(t)  and  h(t)  are 
parameterized  by  X.  For  example,  if  h(t)  is  composed  of 

h(.t)  “  A  expC-st) 

and 

t 

gCtl  *  hCt)dt  »  ~  empC-st)  [for  s  >  0] 

then  the  pencil  Ag(t)  +  h(t)  can  be  formed.  The  pencil  of  functions 
is  linearly  dependent  only  when  X  «  -s.  Therefore,  the  value  of  X 
can  be  computed  from  h(t)  and  its  integral  using  their  inner  product. 
The  main  result  thus  concerning  the  linear  dependence  of  the  pencil 
sets  is  that  the  parameter  X  satisfy  a  polynomial  equation.  The  de¬ 
tails  of  this  method  may  be  found  in  references  [36-38].  An 
advantage  of  this  technique  is  the  generation  of  the  successive 


36  V.K.  Jain,  "On  System  Identification  and  Approximation,"  Florida 
State  University,  Tallahassee,  Eng.  Res.  Rep.,  SS-I  1,  1970. 

37  V.K.  Jain,  "Filter  Analysis  by  Use  of  Pencil  of  Functions:  Part 

I  &  II,"  IEEE  Trans,  on  Circuits  and  Systems,  Vol.  CAS-21,  No.  6, 
September  1974. 

38  T.K.  Sarkar  et  al,  "Suboptimal  System  Approximation/Identification 
with  known  Error,"  Report  No.  AFWL-TR-77-200  on  Contract  F33615- 
77-C-2059,  September  1977. 
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integrals  of  the  function.  So  assuming  the  function  itself  is  in  L, 
space,  then  the  set  generated  by  the  successive  integrals  forms  a 
compact  set  in  L,  space.  This  is  how  the  ill-posed  system  identifica¬ 
tion  problem  is  regularized  by  converting  the  function  to  L^.  Since 
we  are  interested  first  in  finding  Che  values  of  A  for  the  pencil,  genera¬ 
tion  of  successive  integrals  of  the  function  forces  the  solution  to 
belong  to  the  compact  set  spanned  by  the  integrals.  That  is  why  it 
has  been  possible  to  estimate  an  error  bound  on  the  location  of  the 
poles  [36-38]. 

Another  added  advantage  of  formulating  the  problem  this  way  is 
that  the  effect  of  conventional  filtering  can  be  greatly  reduced.  This 
ran  be  achieved  through  successively  smoothing  the  function  by  passing 
it  through  a  band  pasB  filter  with  transfer  function,  [(as  +  b)/(cs  +  d  _)  ] , 
as  opposed  to  pure  integration.  The  integrator  is  then  a  special  case 
of  a  band  pass  filter  for  a  -  d  -  0  and  b  -  c  -  1.  This  can  increase  the 
frequency  resolution  of  the  indeatif ication  technique. 

Moreover,  as  the  poles  are  obtained  from  a  polynomial  equation 
whose  coefficients  form  the  minors  of  the  Grampian  of  the  pencil  of 
functions,  noise  corrections  can  be  done  easily.  Thus,  in  order  to 
make  the  estimate  of  the  poles  unbiased,  the  entries  in  Che  gram-matrix 
can  be  altered  in  a  systematic  way  to  yield  an  unbiased  estimate  for 
the  poles  [25,26], 

As  an  example  consider  the  transient  response  of  a  conducting 
pipe  tested  at  the  ATHAMAS-I  EMP  simulator.  The  conducting  pipe  is 
10  m  long  and  1  m  in  diameter.  Hence,  the  true  resonance  of  the  pipe 
is  expected  to  be  in  the  neighborhood  of  14  MHz.  Also,  the  pipe  has 
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been  excited  in  such  a  way  that  it  is  reasonable  to  expect  only  odd 
harmonics  at  the  scattered  fields.  The  data  which  have  been  measured 
are  the  integral  of  the  E-field  and  hence  is  available  in  terms  of  a 
voltage.  Thus,  in  addition  to  the  frequencies  of  the  conducting  pipe 
one  should  also  observe  a  very  dominant  low  frequency  pole.  The  same 
transient  data  as  depicted  in  Figure  2  {Figure  10  of  reference  [38]} 
is  used  for  analysis.  The  results  for  a  fifth  and  a  seventh  order 
system  are  as  follows: 

For  n  -  5,  the  poles  in  radians /sec  are 

(-0.0029  +  j 0.083)  x  109  (-13.33  MHz) 

(-0.0428  +  j 0.217)  x  109  (-35.20  MHz) 

(-0.0098  )  x  109  C-  1.56  MHz) 

For  n  -  7,  the  poles  in  radians/sec  are 

(-0.0058  +  j 0.084)  x  109  (-13.40.HHz) 

(-0.0270  +  j 0.219)  x  109  (-35.10  MHz) 

(-0.0270  +  j0.55Q)  x  109  (-87.60  MHz] 

(-0.0012  )  x  109  (-0.19  MHz) 

It  is  interesting  to  observe  that  the  real  pole  due  to  the  in¬ 
tegrator  has  been  obtained.  This  pole  is  a  very  dominant  pole  as  the 

data  was  recorded  after  having  passed  through  an  integrator. 

The  above  results  display  a  dynamic  range  of  approximately  1000:1  for 
the  values  of  poles  of  the  conducting  pipe. 

Next  the  data  were  differentiated  to  get  rid  of  the  undesirable 
dominant  pole  of  the  integrator.  The  differentiation  was  done  numerically. 
For  a  fourth  and  a  sixth  order  system  the  above  results  have  been 
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Transient  Response 


recalculated  as  follows : 


For  n  -  4,  the  poles  in  radians /sec  are 


(-0.0026  +  j 0.086)  x  10s 

(-13.70  MHz) 

(-0.0480  +  j0.235)  x  1 09 

(-37.47  MHz) 

For  n  -  6,  the  poles  in  radians /sec  are 

(-0.005  +  j 0.083)  x  109 

(-13.23  MHz) 

C-0.034  +  j 0.221)  x  109 

(-35.59  MHz) 

(-0,071  +  j 0.406)  x  109 

(-65.9  MEz) 

Here  a  good  approximation  to  the  poles  has  been  obtained  with  only 
four  poles.  Also,  there  seems  to  be  a  good  agreement  in  the  pole 
locations  obtained  from  the  original  integrated  data  and  the  numeri¬ 
cally  differentiated  data.  It  is  also  interesting  to  observe  that 
indeed  the  poles  are  occurring  approximately  at  odd  harmonics  of  the 
fundamental.  Hence,  the  pencil-of-functions  method  does  provide 
stable,  reliable,  consistent  and  accurate  values  of  poles  from  noise 
contaminated  measured  responses  of  electromagnetic  systems. 

In  Figure  3,  the  true  numerically  differentiated  data  is 
plotted  against  the  reconstructed  response  of  a  sixth  order  system. 

The  plot  has  been  normalized  to  unity  amplitude.  It  is  interesting  to 
note  that  there  is  a  close  agreement  even  In  the  very  early  times  of 
the  two  waveforms. 
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Tine  in  nanoseconds  -*» 


Fig. 3.  True  Response  Vs.  Reconstructed  Response  of  a  Sixth  Order  System  for 
a  10  a  Long  1  m  Diameter  Conducting  Ripe. 
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VII.  DISCUSSIONS 


The  previous  sections  demonstrate  that  the  use  of  a  proper 
mathematical  model  is  extremely  important  in  regularizing  an  ill- 
posed  problem.  It  has  also  been  shown  that  the  pencil -of-functions 
differs  radically  from  the  other  existing  schemes  of  finding  poles 
and  residues  of  a  finite  length  noise  contaminated  record.  It  is 
because  of  the  use  of  a  completely  different  regularizing  scheme 
that  analytical  error  counds  on  the  pole  locations  are  possible. 

This  is  why  reliable,  consistent,  stable  and  accurate  results  for 
the  poles  and  residues  have  been  obtained  for  this  method.  Thus, 
the  pencil-of -functions  method  shows  a  great  promise  for  the  analysis 
of  poles  and  residues  from  measured  transient  responses  of  a 
finite-size  conducting  body  in  free  space. 
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